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ABSTRACT 


Because  of  the  high  cost  of  fabricating  an  Integrated  Circuit(IC).  it  is  important  to 
verify  the  design  using  simulation.  There  are  a  wide  variety  of  techniques  for  Simulating 
integrated  circuit  designs,  but  the  most  accurate  and  reliable  is  to  construct  the  system  of 
nonlinear  ordinary  differential  equations  that  describe  a  given  circuit,  and  solve  the  system 
with  a  numerical  integration  method.  This  approach,  referred  to  as  circuit  simulation,  is 
computationally  expensive,  particularly  when  applied  to  large  circuits.  To  reduce!  the  com¬ 
putation  time  required  to  simulate  large  MOS  circuits,  new  numerical  integration  algo¬ 
rithms  based  on  relaxation  techniques  have  been  developed.  These  techniques  can  reduce 
the  simulation  time  as  much  as  an  order  of  magnitude  over  standard  circuit  simulation 
programs.  In  addition,  they  are  particularly  suited  for  parallel  implementation.  This 
thesis  covers  both  the  classical  numerical  techniques  and  the  new  relaxation-based  algo¬ 
rithms.  with  particular  emphasis  on  the  Waveform  Relaxation  (WR)  family  of  algorithms. 
Algorithms  in  this  family  are  reviewed,  convergence  theorems  are  included,  and  their 
implementations  on  a  parallel  processor  are  presented. 
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CHAPTER  1  -  INTRODUCTION 


Reliable  and  accurate  simulation  tools  must  play  a  key  role  in  Integrated  Circuit  (IC)  design. 
This  is  because  fabricating  an  integrated  circuit  is  expensive  and  often  time-consuming  (on  the  order 
of  months).  In  addition,  minor  errors  in  the  integrated  circuit  design  can  not  usually  be  corrected 
after  fabrication.  Therefore,  design  errors  must  be  uncovered  before  fabrication,  and  this  can  be 
done  through  the  use  of  simulation. 

There  are  a  wide  variety  of  techniques  for  simulating  integrated  circuit  designs,  but  none  are 
as  accurate,  reliable,  and  technology  independent  as  constructing  the  system  of  nonlinear  ordinary 
differential  equations  that  describe  a  given  circuit,  and  solving  this  system  with  a  numerical  inte¬ 
gration  method.  This  approach,  referred  to  as  circuit  simulation,  has  been  implemented  in  a  variety 
of  programs  such  as  SPICE[2]  or  ASTAP[3]  These  programs  use  a  standard,  or  direct,  techniques 
based  on  the  following  four  steps: 

i)  An  extended  form  of  the  nodal  analysis  technique  to  construct  a  system  of  the  differential  equations 
from  the  circuit  topology. 

ii)  Stiffly  stable  implicit  integration  methods,  such  as  the  Backward  Difference  formulas,  to  convert 
the  differential  equations  which  describe  the  system  into  a  sequence  of  nonlinear  algebraic  equations. 

iii)  Modified  Newton  methods  to  solve  the  algebraic  equations  by  solving  a  sequence  of  linear  prob¬ 
lems. 

iv)  Sparse  Gaussian  Elimination  to  solve  the  systems  of  linear  equations  generated  by  the  Newton 
method. 

Circuit  simulation  tools  based  on  the  above  techniques  are  heavily  used.  Companies  spend 
many  millions  of  dollars  per  year  in  computer  costs,  and  a  number  of  companies  run  over  60,000 
simulations/month.  However,  these  programs  were  designed  in  the  early  1970’s  for  the  simulation 
of  circuits  with  a  few  hundred  transistors  at  most  They  are  now  being  applied,  somewhat  inappro¬ 
priately,  to  the  task  of  simulating  digital  and  analog  VLSI  circuits,  which  can  contain  more  than 
50,000  devices.  As  problems  increase  in  size,  it  becomes  less  economically  feasible  to  use  the  above 
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direct  techniques.  SPICE[2]  and  ASTAP[3]  can  take  several  hours  (on  a  VAX1 1/780)  to  simulate 
circuits  with  only  a  few  hundred  devices. 

There  are  two  reasons  why  the  direct  approach  described  above  can  become  inefficient  for 
large  systems.  The  most  obvious  reason  is  that  sparse  matrix  solution  time  will  grow  super-lineariy 
with  the  size  of  the  problem.  Experimental  evidence  indicates  that  the  point  where  the  matrix  sol¬ 
ution  time  begins  to  dominate  is  when  the  system  has  over  several  thousand  nodes,  and  this  is  the  size 
of  systems  that  are  beginning  to  be  simulated  for  new  IC  designs. 

The  direct  methods  become  inefficient  for  large  problems  also  because,  for  large  differential 
equation  systems,  the  different  state  variables  are  changing  at  very  different  rates.  Direct  application 
of  the  integration  method  forces  every  differential  equation  in  the  system  to  be  discretized  identically, 
and  this  discretization  must  be  fine  enough  so  that  the  fastest  changing  state  variable  in  the  system 
is  accurately  represented.  If  it  were  possible  to  pick  different  discretization  points,  or  time-steps,  for 
each  differential  equation  in  the  system  so  that  each  could  use  the  largest  time-step  that  would  accu¬ 
rately  reflect  the  behavior  of  its  associated  state  variable,  then  the  efficiency  of  the  simulation  would 
be  greatly  improved. 

Several  modifications  of  the  direct  method  have  been  used  that  both  avoid  large  sparse  matrix 
solutions,  and  allow  the  individual  equations  of  the  system  to  use  different  time-steps 
[4,5,6,7,8,9,10,11].  One  class  of  such  techniques.  Waveform  Relaxation[ll, 12,13, 14, 15, 16, 17,18] 
is  based  on  "lifting”  the  Gauss-Seidel  and  Gauss-Jacobi  relaxation  techniques  for  solving  large  alge¬ 
braic  systems  to  the  problem  of  solving  the  large  systems  of  ordinary  differential  equations  associated 
with  MOS  digital  circuits.  Briefly,  the  idea  of  these  relaxation  technique  is  to  first  break  a  large  circuit 
into  loosely  coupled  subcircuits.  Then  the  behavior  of  each  subcircuit,  over  some  interval  of  time,  is 
calculated  by  "guessing"  the  behavior  of  the  surrounding  subcircuits  over  the  same  interval  of  time. 
The  responses  for  each  subcircuit  are  used  to  improve  these  guesses,  and  the  response  is  recalculated. 
The  procedure  is  iterated  until  the  convergence  is  achieved  for  each  subcircuit  over  the  interval  of 
time.  Other  relaxation  techniques  such  as  the  Gauss-Seidel-Newton  algorithm  [21]  can  be  applied  to 
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solve  the  nonlinear  system  of  algebraic  equations  in  place  of  the  standard  Newton-Raphson  tech¬ 
niques. 

Two  circuit  simulation  programs  have  been  developed  at  Berkeley  using  relaxation  techniques: 
RELAX,  based  on  Waveform  Relaxation[ll,18]  and  SPLICE,  based  on  Iterated  Timing  Analysis 
(ITA)  [33],  a  form  of  Gauss-Seidel-Newton  technique.  On  a  uniprocessor,  these  programs  can  show 
speed  improvements  over  direct  methods  of  up  to  an  order  of  magnitude  even  for  problems  with  only 
a  few  hundred  devices.  In  addition,  both  the  ITA  and  Waveform  Relaxation  are  particularly  ame¬ 
nable  to  use  on  multiprocessors  because  the  computational  method  already  decomposes  the  problem. 
A  distributed  form  of  the  ITA  algorithm,  called  DITA,  has  been  recently  developed  and  a  prototype 
DITA  simulator,  the  MSPLICE  program,  has  been  implemented[34]. 

In  this  thesis  I  present  a  complete  and  consistent  study  of  the  existing  body  of  research  relating 
to  the  application  of  numerical  integration  methods  differential  systems  that  describe  circuits.  I  then 
present  new  theoretical  and  practical  results  on  the  application  of  WR  to  numerically  solving  the  dif¬ 
ferential  equations  generated  from  circuits,  both  on  serial  and  parallel  processors. 

I  start  in  Chapter  2  with  an  introduction  to  the  circuit  simulation  problem,  beginning  with  how 
the  differential  equations  that  describe  a  circuit  are  formulated  from  the  circuit  topology.  Then,  those 
aspects  of  the  circuit  simulation  problem  that  play  a  role  in  the  choice  of  numerical  method  are  de¬ 
scribed.  The  well-known  issues  of  consistency  and  stiff  stability[l]  is  mentioned  briefly,  as  is  a  con¬ 
sistent  interpretation  of  the  charge  conservation  property[41].  The  chapter  is  ended  with  the 
description  of  a  new  property  that  can  be  used  to  classify  integration  methods,  that  of  exhaustive  do¬ 
main  of  dependence. 

In  Chapter  3,  many  of  integration  methods  that  have  been  applied  to  circuit  simulation  prob¬ 
lems  are  analyzed  with  respect  to  the  properties  described  in  Chapter  2.  The  standard  multistep  in¬ 
tegration  methods  are  analyzed  first,  and  it  is  proved  that  the  implicit  multistep  integration  methods 
commonly  used  in  circuit  simulation  have  all  the  desirable  properties  given  in  Chapter  2.  Following, 
the  relaxation  algorithms  that  have  been  used  to  solve  the  large  algebraic  systems  generated  by  im- 
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plicit  integration  methods  is  described[21,33],  and  a  theorem  guaranteeing  the  convergence  of  such 
methods  for  small  timesteps  is  proved[12].  Then,  the  semi-implicit  integration  algorithms,  used  in 
special  purpose  timing  simulation  programs[5,6,7,8],  are  analyzed  with  respect  to  their  domain  of 
dependence  and  stability  properties.  The  chapter  is  ended  by  comparing  the  semi-implicit  and  relax¬ 
ation  algorithms. 

The  theoretical  basis  for  the  family  of  WR  algorithms,  methods  for  the  decomposed  solution 
of  differential  equations,  is  presented  in  Chapter  4.  Waveform  relaxation  is  introduced  with  a  simple 
example  followed  by  a  general  algorithm.  Then  a  new  proof  of  the  WR  convergence,  one  that  dem¬ 
onstrates  that  the  WR  algorithm  is  a  contraction  mapping  in  a  particular  norm,  is  presented.  Exten¬ 
sions  to  the  basic  algorithm  that  allow  for  modified  iteration  equations  is  presented  and  it  is  shown 
that  the  convergence  of  such  extensions  follows  directly  from  the  proof  that  the  WR  algorithm  is  a 
contraction  mapping.  Following,  an  extension  of  the  Newton  Method  to  function  spaces  is  presented, 
and  its  convergence  proved  using  lemmas  from  the  basic  theorem.  The  waveform  Newton  algorithm 
will  then  be  combined  with  the  WR  algorithm  to  produce  a  waveform  relaxation-Newton(WRN) 
algorithm[22]. 

To  compute  the  iteration  waveforms  for  the  WR  algorithm  it  is  usually  necessary  to  solve  sys¬ 
tems  of  nonlinear  ordinary  differential  equations.  If  multistep  integration  formulas  are  used  to  solve 
for  the  iteration  waveforms,  the  numerical  integration  method  plays  a  role  in  the  convergence  prop¬ 
erties  of  this  discretized  WR  algorithm[29].  In  Chapter  5,  the  interaction  between  WR  algorithms 
and  multistep  integration  methods  is  considered  in  detail.  The  discretized  WR  algorithm  will  be  an¬ 
alyzed  first  assuming  that  every  differential  equation  in  the  system  is  discretized  identically  (the 
globai-timestep  case).  A  simple  example  is  presented  that  demonstrates  a  possible  breakdown  of  the 
WR  method  under  discretizations.  Then,  a  comparison  is  drawn  between  the  discretized  WR  algo¬ 
rithm  and  the  algebraic  relaxation  methods  described  in  Chapter  3  and  a  strong  comparison  theorem 
for  linear  systems  is  proved.  Following,  a  convergence  theorem  for  the  fixed  globai-timestep 
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discretized  WR  algorithm  will  then  be  presented.  The  global-timestep  restriction  will  then  be  lifted, 
and  the  first  theorem  proving  the  convergence  of  the  multi-rate  WR  relaxation  algorithm  is  presented. 

In  Chapter  6,  the  theoretical  background  for  two  of  the  techniques  for  accelerating  WR  con¬ 
vergence  is  presented.  First,  why  breaking  the  simulation  interval  into  pieces,  called  windows,  can 
be  used  to  reduce  the  number  of  relaxation  iterations  required  to  achieve  convergence  is 
examined[17],  and  then  how  to  partition  large  systems  into  subsystems  in  such  a  way  that  the  WR 
algorithm  converges  rapidly  is  considered[31]. 

The  implementation  of  the  WR  algorithm  in  the  RELAX2.3  program  is  described  in  Chapter 
7.  The  partitioning,  numerical  integration,  windowing  and  partial  waveform  convergence  algorithms 
as  applied  to  MOS  circuits  are  presented.  The  results  from  simulating  a  CMOS  memory  circuit  are 
analyzed,  in  order  to  demonstrate  more  clearly  both  the  practicality  of  the  WR  algorithm,  and  the 
specific  nature  of  its  efficiencies.  The  chapter  will  be  concluded  with  a  table  of  results  from  the 
RELAX2.3  program  applied  to  a  variety  of  MOS  circuits. 

The  implementation  of  two  WR-based  parallel  circuit  simulation  algorithms  on  a  shared- 
memory  computer  are  described  in  Chapter  8[17].  A  brief  overview  of  the  aspects  of  a  shared- 
memory  computer  that  effect  the  algorithm  implementation  are  presented,  followed  by  the 
description  of,  and  experimental  results  from,  the  two  parallel  WR  algorithms. 
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CHAPTER  2  -  THE  CIRCUIT  SIMULATION  PROBLEM 

As  mentioned  in  the  introduction,  circuit  simulation  amounts  to  solving  numerically  the  system 
of  nonlinear  ODE’s  that  describe  the  dynamic  behavior  of  a  circuit.  In  this  Chapter,  we  will  address 
the  two  topics  of  the  construction  of  a  system  of  differential  equations  from  a  given  circuit  topology 
and  its  properties,  and  the  issues  to  consider  when  choosing  a  numerical  method  for  solving  that  sys¬ 
tem. 


SECTION  2.1  -  THE  EQUATION  SYSTEM 

The  most  general  formulation  of  a  system  of  nonlinear  differential  equations  is  the  following 
implicit  formulation: 


F(x(t),  x(t),  u(t) )  -  0  *(0)  -  *o  [2.1] 

where  *(0  e  IR*  on  re  [0,T];  u(t)  e  IR'  on  re  [0,7]  is  piecewise  continuous;  and 
F:  IR";riR";cIRr  -*■  IR"  is  continuous. 

Before  considering  techniques  for  numerical  solution,  we  first  must  guarantee  that  Eqn.  (2.1) 
has  a  solution.  If  we  require  that  there  exists  a  transformation  of  Eqn.  (2.1)  to  the  form  y  *•  f(y,u) 
where  /  is  Lipschitz  continuous  with  respect  to  y  for  all  u,  then  a  unique  solution  for  the  system 
exists[39].  Although  there  are  many  sets  of  broad  constraints  on  F  that  guarantee  the  existence  of 
such  a  transformation,  the  conditions  can  be  difficult  to  verify  in  practice.  Rather  than  carefully 
considering  the  existence  question,  which  will  complicate  the  analyses  that  follow  without  lending 
much  insight,  we  will  consider  the  following  less  general  form,  in  which  most  circuit  simulation 
problems  can  be  described. 


C(x(f),  u(t))x(t)  -  f[x(t),  u(t ))  x(0)  -  Xq 


[2.2] 
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where  jc(r)  £  1R"  on  t  e  [0,7];  u(t)  e  lRr  on  t  e  [0,7]  is  piecewise  continuous;  C:  R"jrlRr  -*•  IR"”  is 
such  that  C(jt,  i/)-1  exists  and  is  uniformly  bounded  with  respect  to  x,  u\  and  /:  lR"ArRr  ■*  1R"  is 
globally  Lipschitz  continuous  with  respect  to  x  for  all  u(t)  e  ]Rr. 

The  fact  that  C(x,  u)  has  a  well-behaved  inverse  guarantees  the  existence  of  a  normal  form  for 
Eqn.  (2.2),  and  that  x(f)  e  1R"  is  the  vector  of  state  variables  for  the  system.  Then  as  /  is  globally 
Lipschitz  continuous  with  respect  to  x  for  all  u,  C(x,  u)~x  is  uniformly  bounded,  and  u(t)  is  piecewise 
continuous,  there  exists  a  unique  solution  to  Eqn.  (2.2)  on  any  finite  interval  [0,7]  [39]. 

SECTION  2.1.1  -  CONSTRUCTING  THE  EQUATION  SYSTEM 

The  behavior  of  the  most  commonly  modeled  nonlinear  circuit  elements:  diodes,  bipolar  tran¬ 
sistors,  and  MOS  transistors,  can  be  described  by  voltage-controlled  current  and  charge  equations. 
For  example,  consider  the  diode  in  Fig.  2.1  for  the  case  where  the  voltage  across  the  diode 
K*  <  0.0.  Then  the  anode  and  cathode  currents,  /,  and  4  respectively,  and  the  anode  and  cathode 
charges,  q„  and  qc  respectively,  can  be  computed  (to  first  order)  from  the  following  equations, 

ia  -  Is{eVjV'  -  1) 

t  -  “'a 


9c  "  4a 

where  I,  is  the  saturation  current,  V,  is  the  thermal  voltage,  C0  is  the  zero-bias  junction  capacitance, 
and  ^  is  the  junction  potential. 

For  an  arbitrary  circuit  made  up  of  a  network  of  elements  described  by  voltage-controlled 
current  and  charge  equations,  it  is  possible  to  construct  a  system  of  differential  equations  that  de¬ 
scribes  the  circuit  by  using  nodal  analysis[36].  This  amounts  to  applying  the  relationship  that  the 
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time  derivative  of  charge,  q  ,  is  a  current,  and  insisting  that  the  sum  of  the  currents  leaving  each  node 
(currents  entering  the  node  are  assigned  negative  sign)  in  the  network  is  precisely  zero  (Kirkchoff’s 
Current  Law,  KCL).  That  is,  for  each  node  in  the  network: 

2  «/('))  +  2  -  0.  [2.3] 

elements  at  i  elements  at  i 

where  v(0  e  IK"  is  the  vector  of  node  voltages,  and  u(t)  e  IRr.  If  a  system  were  constructed  using  the 
KCL  equations  for  every  node  in  the  circuit,  the  system  would  be  overdetermined.  For  this  reason, 
the  equation  for  an  arbitrary  node  in  the  circuit,  referred  to  as  the  reference  or  ground  node,  is  dis¬ 
carded.  In  addition,  the  KCL  equations  for  the  nodes  for  which  the  node  voltage  is  known  a  priori 
(e.g.,  a  node  connected  to  a  voltage  source  whose  other  terminal  is  connected  to  the  reference  node) 
are  discarded. 

As  an  example,  consider  the  Nand  circuit  in  Fig.  2.2.  In  order  to  solve  for  the  unknown  volt¬ 
ages  V|  and  v*,  we  need  only  form  the  KCL  equations  at  node  1  and  node  2,  and  can  ignore  the  KCL 
equations  for  the  nodes  connected  to  the  voltage  source  and  ground.  For  the  first  node  we  have  the 
equation: 


Vb'  vl)  +  Kr  °)  +  +  clvl  1  “  0 

and  for  the  second  node, 

idj.^2,  Vb,  v,)  +  gx(VM~  yf)  +  Jj-lqsj,^,  Vb,  v,)  +  efcVj  ]  -  0 

where  4ml  and  idm2  are  the  the  currents  flowing  from  the  drain  to  the  source  of  transistor  ml  and  m2 
respectively,  qdml,  qdml,  q^,  are  the  charges  accumulated  at  the  drain  of  transistor  ml  and  the  source 
and  drain  of  transistor  m2  respectively. 
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In  general,  the  nodal  analysis  leads  to  systems  of  the  form: 

))  -  i(v(t),u(0)  -  0 


[2.4] 


where  <7:R"xR’"  -*  R"  is  the  vector  of  the  sums  of  the  charges  at  a  node,  i:R"jrlR™  •*  R"  is  the  vector 
of  the  sums  of  the  currents  entering  a  node,  v  e  R"  is  the  node  voltage  vector,  and  u  e  Rm  is  the 
vector  of  inputs.  The  system  in  Eqn.  (2.4)  can  be  converted  to  the  form  of  Eqn.  (2.2)  by  applying 
the  chain  rule  to  establish  the  identity 

-J p7(v(0,u(0)  -  -—-(v(/),i/(0)vW  +  ~{v(t)Mt))uU). 

da 

We  then  define  x  «•  v,  C(x(f),u(0)  "  — — (v(/‘),v(0),  and 

dv 

da 

f{x(t),u(i))  «  i(v(r),  u(t))  -  (v(t),u(0)u(t)  to  get  a  system  of  the  form  of  Eqn.  (2.2).  Note  that 

du 

dq 

in  order  for  the  / defined  above  to  satisfy  the  Lipschitz  continuity  property,  either  — —  must  be  zero, 

du 

or  u  must  be  bounded. 

dq 

For  a  broad  class  of  circuits,  the  C(x,u)  matrix  defined  by  C(x,u )  —  — ~(v,w)  is  strictly 

dv 

diagonally  dominant  uniformly  in  x,  a  property  which  guarantees  the  existence  of  a  bounded 
inverse[28].  Many  of  results  concerning  relaxation  methods  for  systems  of  the  form  of  Eqn.  (2.2)  rely 
on  this  diagonal  dominance  property,  so  we  will  describe  under  what  conditions  a  circuit  will  produce 
a  C(x,u )  that  is  diagonally  dominant. 

Consider  the  two  node  example  in  Fig.  2.3.  Applying  the  nodal  analysis  technique  described 
above  yields  the  following  differential  equations: 

(Cj  +  C/)V,(0  -  Cfat)  -  gjV](0 


(c2  +  Cj-X^U)  -  cyvj(r)  -  g2v2(/) 
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As  this  example  demonstrates,  for  circuits  whose  only  charge  elements  are  capacitors,  the  i'"1  diagonal 
entry  of  the  C  matrix  is  the  sum  of  the  capacitance  incident  at  node  i,  and  the  ifk  entry  is  the  negative 
value  of  the  capacitance  between  node  i  and  node  j.  It  therefore  follows  that  the  sum  of  the  absolute 
value  off-diagonal  terms  is  less  than  or  equal  to  the  diagonal  terms  where  strict  inequality  holds  if 
there  a  nonzero  capacitance  between  node  i  and  a  voltage  source  or  ground  node.  This  example  leads 
to  the  following  important  observation  which  is  easily  verified. 

Observation  If  a  system  of  equations  of  the  form  of  Eqn.  (2.2)  is  constructed  by  applying  the  nodal 
analysis  technique  described  above  to  a  circuit  which  contains  capacitors  (linear  or  nonlinear),  or  any 
other  elements  whose  charge  function  has  a  diagonally  dominant  Jacobian,  then  the  capacitance  ma¬ 
trix  C(x,u)  of  Eqn.  (2.2)  is  diagonally  dominant  If,  in  addition,  there  exist  a  linear  or  nonlinear 
capacitor,  bounded  away  from  zero,  to  ground  or  a  voltage  source  at  each  node  in  the  circuit,  the 
matrix  C(x,u)  is  strictly  diagonally  dominant  for  all  x,  u. 

SECTION  2.1.2  -  EXTENDING  THE  CONSTRUCTION  TECHNIQUE 

The  nodal  analysis  technique  can  only  be  used  to  form  the  differential  equations  of  circuits  with 
elements  whose  current  or  charge  is  a  well-behaved  function  of  voltage.  It  is  possible  to  extend  the 
technique  to  include  circuits  with  inductors  and  floating  voltage  sources  by  using  Modified  Nodal 
Analysis  [38].  A  similar  technique  is  used  in  this  section  to  show  that  circuits  with  these  two  types 
elements  can  be  described  by  a  differential  equation  system  of  the  form  of  Eqn.  (2.2).  This  demon¬ 
strates  that  the  form  of  Eqn.  (2.2)  can  emcompass  much  more  that  just  circuits  with  voltage- 
controlled  current  and  charge  elements,  and  is  a  justification  for  considering  only  systems  of  the  form 
of  Eqn.  (2.2)  for  rest  of  this  thesis. 

Consider  a  large  network  with  two  nodes  that  are  connected  by  a  floating  voltage  source  as  in 
Fig.  2.4.  The  nodal  analysis  equations  can  be  written  for  the  two  nodes  and  are  for  node  a, 
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k 

2  ‘A**  vb'  V)  +  4nr  -  0 

7-1 

for  node  6, 

/ 

•  2  V<V  vi>*  v>  ~  'inr  -  0 

y-i+i 

where  v  is  the  vector  of  all  the  other  node  voltages  and  is  the  current  through  the  voltage  source. 
Given  an  additional  variable  has  been  introduced,  an  additional  equation  is  needed  to  compute  the 
solution, 

ya  *  vb  +  v- 

In  order  to  convert  this  set  of  equations  into  the  form  of  Eqn.  (2.2)  we  perform  a  simple  substitution 
to  generate  one  equation  in  one  unknown  (here  we  have  arbitrarily  chosen  vb) 

k 

2)i,(vfr  +  V,  vb,  v)  -  0 
7=1 

It  is  somewhat  more  complicated  to  reorganize  the  equations  of  circuits  with  inductors  so  that 
they  fit  into  the  form  of  Eqn.  (2.2).  This  is  because  the  voltage  across  the  inductor  is  a  function  of 
the  time  derivative  of  current  passing  through  it.  For  the  example  in  Fig.  2.5a,  the  KCL  equation  for 


0 


node  a  is 
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and  for  node  b. 


k 

2  h<va’  vb>  v)  ~  iind  “  0 

1=7+ 1 


and  for  the  inductor. 


“  (vfl  -  vA) 


0 


where  v„  and  vb  are  the  voltages  at  the  inductor  terminals;  v  is  the  vector  of  node  voltages  for  the  entire 
circuit  excluding  v„  and  v„;  is  the  inductor  current,  and  L  is  its  inductance. 

Since  the  derivative  of  inductor  current  is  present  in  the  equations,  in  order  to  include  the 
inductor  in  the  system  of  Eqn.  (2.2),  the  current  must  be  included  in  the  set  of  state  variables.  A 
circuit  interpretation  of  such  a  reorganization  is  to  replace  the  inductor  by  an  extra  circuit  node,  a 
grounded  capacitor  of  capacitance  L,  and  two  voltage -controlled  current  sources  (See  fig.  2.5b). 
Note  that  the  extra  row  in  Eqn.  (2.2)  that  would  be  generated  by  including  an  inductor  in  a  given 
circuit  will  not  destroy  the  invertibility  or  strict  diagonal  dominance  property  of  C(x,u),  because  the 
extra  row  in  C(x,u )  will  contain  only  one  nonzero  entry,  on  the  diagonal. 

SECTION  2.2  -  NUMERICAL  INTEGRATION  PROPERTIES 

Once  the  system  of  differential  equations  has  been  constructed  from  the  circuit  topology,  it 
must  be  solved  numerically.  The  usual  approach  is  to  use  one  of  the  many  numerical  integration 
formulas  to  convert  the  differential  equations  which  describe  the  system  into  a  sequence  of  nonlinear 
algebraic  equations. 

For  example,  the  most  obvious  numerical  integration  formula  is  the  explicit-Euler  algorithm. 

Given  the  initial  condition  x(0)  -  x^,  it  is  possible  to  compute  an  approximation  to  x(h),  h  >  0  , 
x(h)  —  x(0)  a 

by  substituting - — - for  x(0),  where  the  notation  x  is  used  to  indicate  numerical  approxi- 
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matron.  Substituting  this  discrete  approximation  into  Eqn.  (2.2)  yields  the  following  equation  for 
*(A): 


x(A)  -  x(0)  +  C(x(0),  u(0))-1/(x(0),  w(0))  [2.5] 

By  substiting  x  (A)  for  x(0)  in  Eqn.  (2.5)  it  is  possible  to  compute  x  (2A),  and  the  process  can  be  re¬ 
peated  to  produce  a  sequence  that  approximates  the  exact  solution  to  the  differential  equation  at 
discrete  points  in  time. 

The  explicit-Euler  algorithm  is  the  simplest  of  a  wide  variety  of  discretization  techniques  for 
numerically  solving  large  systems  of  differential  equations.  In  order  to  chose  a  discretization  method 
that  will  be  efficient  and  accurate  for  a  given  class  of  problems,  it  is  necessary  to  consider  several 
properties  of  the  integration  method  with  respect  to  the  class.  In  this  section  we  will  consider  several 
of  the  key  aspects  of  the  circuit  simulation  problem  that  impact  the  choice  of  numerical  method.  We 
will  start  by  presenting  the  general  classical  consistency  /stability  /convergence  criteria  both  for 
completeness,  and  as  a  vehicle  for  presenting  the  notation  that  will  be  used  throughout  this  thesis. 
We  will  then  consider  more  specific  properties  of  the  circuit  simulation  problem,  starting  with  the 
well-known  issue  of  stiffness.  Following,  the  properties  of  charge  conservation  and  domain  of  de¬ 
pendence  will  be  defined,  and  in  each  case  we  wil  consider  the  impact  these  properties  have  on  the 
choice  of  numerical  method. 

SECTION  2.2.1  -  CONSISTENCY,  STABILITY,  AND  CONVERGENCE 

In  general,  a  numerical  integration  formula  produces  a  sequence  approximation  to  the  solution 
of  a  differential  equation  by  repeated  application,  starting  from  some  initial  condition  Xq.  We  will  de¬ 
note  the  approximation  produced  by  the  m'A  application  of  a  given  numerical  integration  formula  to 
Eqn.  (2.2)  by  x  (rm),  where  rm  e  1R  is  such  that  x  (t„)  is  the  numerical  approximation  to  the  exact 
solution  at  t  —  t„.  It  will  be  assumed  that  if  the  differential  equation  is  to  be  solved  numerically  on 
[0,7],  that  there  exists  some  finite  integer  M,  such  that  ta/  —  T.  In  addition,  we  will  refer  to 
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h„  —  rm  -  rm_t  as  the  m'h  discretization  timestep.  Finally,  we  will  denote  the  entire  sequence 
*(tJ,  m  e  [0,...,M]  by  fx(rj|. 

If  a  numerical  integration  algorithm  is  to  be  of  any  use,  it  must  be  possible  to  arbitrarily  accu¬ 
rately  approximate  the  exact  solution  to  the  differential  equation  system  uniformly  over  [0,7]  by  re¬ 
ducing  the  discretization  timesteps.  An  integration  method  with  this  property  is  said  to  be  convergent, 
defined  formally  as  follows: 

T 

Definition  2.1:  Let  the  discretization  timesteps  be  fixed;  that  is  hm  —  — —  for  all  m  e  {0,...,  M\.  A 

Af 

Numerical  integration  method  is  convergent  with  respect  to  Eqn.  (2.2)  if  the  global  error,  defined  by 

max*<M  -Jr(rm)B  [2.6] 


goes  to  zero  as  M  •*  <*  ■. 

For  a  numerical  integration  method  to  be  convergent,  it  must  have  two  properties.  The  error 
made  in  one  timestep  must  go  to  zero  rapidly  as  the  timestep  decreases,  and  the  errors  should  not 
grow  too  rapidly  over  the  timesteps.  The  error  made  in  one  timestep  is  called  the  local  truncation  error 
(LTE). 

Definition  2.2:  Let  x  (rm)  be  generated  by  applying  one  step  of  a  numerical  integration  formula  to  a 

A  '**'  A 

system  of  the  form  of  Eqn.  (2.2)  given  the  sequence  [x  (r~)},  m  <  m  such  that  x  (t~)  —  x(t~). 
Then  the  local  truncation  error  is  defined  as  H  Jr  (r„)  —  x(rm)  | .  ■ 

The  best  that  one  could  hope  to  show  for  general  systems  is  that  the  global  error  for  the  ap¬ 
proximation,  that  is  max^s*  U  x  (t„)  —  jc(t„)  1 ,  is  a  function  of  the  sum  of  the  local  truncation  errors, 
*/ 

2  LTE"  ,  where  LTE m  is  the  local  truncation  error  at  the  m'k  timestep.  Given  a  fixed  interval  [0,7], 

M-0 

T  T 

and  that  M  —  — ,  this  sum  is  bounded  below  by  —LTE„^n  where  LTEmin  is  the  minimum  of  the  LTE 's 
h  h 

over  all  m.  If  this  sum  is  to  go  to  zero  as  h  —■  0,  then 


tint/, -<)■ 


LTE„ 


h 


0 
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This  property  is  known  as  consistency  [1]  and  is  shared  by  any  "reasonable"  numerical  integration 
method. 

As  an  example,  it  is  possible  to  verify  that  the  explicit-Euler  algorithm  is  consistent  for  systems 
of  the  form  of  Eqn.  (2.2),  by  using  a  Taylor  series  expansion  about  x(rm).  That  is, 


*(0  +  hm+\*(7m) 


where  r  e  [r„,  t„+,].  From  Eqn.  (2.5)  we  get 


Substituting  for  x  using  the  following  identity, 

C(x(Tm),i/(TOT))-1/(x(Tm),  u(rj)  -  x(r„,) 


and  then  subtracting. 


“  *(*»,+ 1> 


[2.7] 


which  verifies  consistency. 

Consistency  is  not  sufficient  to  guarantee  that  a  numerical  integration  method  is  convergent. 
Consistency  only  insures  that  the  local  errors  are  small,  but  does  not  indicate  anything  about  how  the 
errors  propagate  from  one  timestep  to  the  next.  To  insure  convergence  we  need  to  verify  that  the 
numerical  integration  method  has  a  second  property,  that  of  stability[l]. 


Definition  2.3:  A  numerical  integration  method  applied  to  Eqn.  (2.2)  is  stable  if  there  exists  an  hQ  and 
a  constant  K  <  <*  such  that  for  any  two  different  initial  conditions  x0  and  x'0,  and  any 
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-  x\rM)  U  <  K  IU0  -  Jt'ol.B 

The  explicit-Euler  algorithm  is  stable,  but  the  proof  is  lengthy  and  well-documented  elsewhere[l]  so 
we  will  not  repeat  it  here. 

Not  surprisingly,  we  have  the  following  classical  result: 


Theorem  2.1;  If  a  numerical  integration  method  is  consistent  and  stable  with  respect  to  Eqn.  (2.2), 
then  it  is  convergent  with  respect  to  Eqn.  (2.2).  ■ 

Several  different  proofs  have  been  given  for  this  basic  resuit[l]. 

If  an  integration  method  is  convergent  then  when  the  method  is  used  to  compute  an  approxi¬ 
mate  solution  to  a  differential  equation  system,  sufficient  accuracy  can  be  insured  by  using  timesteps 
that  are  small  enough.  Obviously,  it  is  possible  to  insure  that  the  timesteps  are  small  enough  by  using 
extremely  small  timesteps,  but  this  very  inefficient  Instead,  the  integration  timesteps  are  usually 
controlled  by  using  some  check  on  the  discretization  error.  If,  in  any  given  step  the  error  becomes 
too  large,  the  timestep  is  replaced  by  a  smaller  timestep. 

Usually,  the  check  on  the  discretization  error  is  some  computed  estimate  of  the  local  truncation 
error.  For  the  explicit-Euler  algorithm,  for  example,  the  exact  local  truncation  error  at  the  m,h  step 
is  0.5A;+1x(t)  where  r  e  [t„,  tw+1].  An  estimate  of  the  local  truncation  error  of  the  m'1'  explicit-Euler 
step  can  be  computed  using  the  following  divided-difference  estimate  for  x , 


x{rm+l)-x{Tm)  x{rm)-x(rm_{) 
^m+ 1 

0.5(Am+,  +  hnl) 


[2.8] 


Most  of  the  techniques  for  estimating  local  truncation  error  are  only  estimates,  not  bounds.  In 
practice,  these  type  of  estimates  have  proved  to  be  reliable,  but  there  are  certain  common  cases  where 
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the  estimates  are  much  smaller  than  the  actual  error.  An  example  of  such  a  case  will  be  presented  in 
Section  2.2.4. 

SECTION  2.2.2  -  STIFFNES  AND  A-STABILITY 

Consider  the  Example  in  Fig.  2.6,  a  resistor-capacitor  circuit.  The  differential  equation  that 
describes  the  circuit  can  be  constructed  using  the  nodal  analysis  technique  above  and  is: 

HD  -  -  100v(/)  v(0)  -  1.0  [2.9] 

where  v(f)  e  1R.  is  the  node  voltage.  The  exact  solution  for  v  is  v(t)  —  e~'00'.  If  the  interval  of  interest 
is  [0,7],  this  is  a  two  time-scale  problem.  That  is,  v  changes  very  rapidly  compared  to  the  interval  of 
interest 

Any  system  of  differential  equations  that  has  the  kind  of  multiple  time-scale  properties  of  the 
above  example  is  said  to  be  stiff.  Most  circuits  of  interest  generate  stiff  differential  equation  systems, 
and  this  strongly  effects  the  choice  of  integration  formulas.  For  example,  the  explicit-Euler  algorithm 
applied  with  a  fixed  timestep  h  to  numericallly  solve  Eqn.  (2.9),  yields  the  following  recursion 
equation  for  v , 


y(rm)  -  (1  -  100Am)v(rm_j) 


or  given  v(0)  -  1, 


A 

V 


m 

(rj  -  - ioo/i-)- 


i«i 


Clearly,  |  v  (rm)  |  will  decay  only  if  hm  <  0.02  for  all  m,  and  v  (t„)  will  decay  monotonically  to  0  only 
if  hm  <  0.01  for  all  m.  If  larger  timesteps  are  used,  |  v(rm)  |  will  grow.  What  this  implies  is  that  in 
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order  to  accurately  compute  a  sequence  approximation  to  the  solution  of  this  system  using  explicit- 
Euler,  small  timesteps  must  be  used  even  when  the  solution  is  not  changing  appreciably. 

Now  consider  a  slightly  different  numerical  integration  formula,  the  implicit-EuIer  algorithm, 
where  v(rm)  is  approximated  by  -^-(v (t,„)  -  v(rK_,)).  Just  like  explicit-EuIer,  implicic-Euler  is 

hm 

convergent,  and  the  local  truncation  error  is  of  order  h2.  When  applied  to  Eqn.  (2.9)  the  following 
recursion  equation  results: 


-  100 h„,  v (rm) 


or  reorganizing. 


)  - 


1 


(1  +  100A..) 


Again  using  the  fact  that  v(0)  -  1, 


m 

y(rj  -  nu  +  100A,.)-1 

<=  l 

Note  that  in  this  case,  any  hm  >  0  will  produce  a  monotonically  decaying  sequence.  The  tremendous 
advantage  of  this  method  over  explicit-Euler  is  that  small  timesteps  can  be  used  for  the  first  few  steps 
to  accurately  resolve  the  rapid  decay,  and  when  the  solution  stops  changing  appreciably,  the  timestep 
can  safely  be  made  orders  of  magnitude  larger  without  causing  the  computed  solution  to  grow. 

The  implicit-Euler  algorithm  has  a  property  that  is  "stronger"  than  the  numerical  stability  of 
Definition  2.3,  which  we  define  below  as  A-stability: 

Definition  2.4:  Let  {Jc  (t„)}  be  the  sequence  generated  by  a  numerical  integration  method  applied  to 
the  equation 


x(t)  -  Ax(t )  *(0)  -  x0 
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where  x(t)  e  1R",  and  A  e  1R""  and  rm  —  hm  —  h  for  all  m.  Given  {X,},  the  set  of 

eigenvalues  of  A,  the  region  of  stability  for  the  integration  method  is  the  subset  of  such  that  if  AX, 
is  inside  the  region  of  stability  for  all  i,  then  x(r„)  -•  0  as  m  -*•  «.  The  numerical  integration  method 
is  A-stable  if  the  region  of  stability  includes  the  entire  left-half  plane  of  <jT.  ■ 

The  above  definition  differs  from  the  original  definition  given  by  Dahlquist[42]  in  that  a  matrix  rather 
than  scalar  test  problem  is  used[6].  As  will  become  apparent  in  following  sections,  a  matrix  test 
problem  is  more  appropriate  for  analyzing  methods  designed  for  large  scale  systems. 

Both  the  explicit-Euler  and  impiicit-EuIer  algorithms  can  be  used  to  produce  arbitrarily  accu¬ 
rate  discrete  approximations  to  the  exact  solution  of  Eqn.  (2.9),  as  both  are  convergent.  The 
implicit-Euler  algorithm  will  allow  much  larger  timesteps  to  be  used  with  no  appreciable  loss  of  ac¬ 
curacy  and  hence  will  be  more  efficient.  But  improving  efficiency  is  not  the  only  reason  one  would 
choose  implicit-Euler,  or  another  A-stable  numerical  integration  method.  There  is  also  the  consider¬ 
ation  of  numerical  robustness.  That  is,  if  an  A-stable  method  is  used,  the  timestep  can  safely  be  set 
by  only  considering  local  truncation  error  criteria,  which  can  be  reasonably  estimated.  If  a  method 
that  is  not  A-stable  is  used,  the  timestep  must  be  bounded  to  insure  stability.  Such  a  bound  will  be  a 
function  of  the  eigenvalues  for  a  linear  problem,  and  it  is  difficult  to  get  reasonable  estimates  of 
eigenvalues. 

SECTION  2.2.3  -  CHARGE  CONSERVATION 

Many  differential  equation  systems  generated  from  physical  problems  can  be  characterized  by 
the  preservation  of  certain  quantities,  and  frequently  it  is  important  that  the  numerical  method  also 
preserve  these  quantities.  For  example,  when  numerically  solving  the  differential  equations  that  de¬ 
scribe  the  motion  of  a  swinging  pendulum  in  a  frictionless  environment,  it  is  important  to  insure  en¬ 
ergy  remains  constant.  If  energy  increased  due  to  numerical  error,  the  computed  solution  would 
indicate  that  the  pendulum  would  swing  higher  and  higher,  and  if  energy  were  lost,  the  computed 
solution  would  indicate  that  the  pendulum  would  eventually  come  to  a  halt. 
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In  the  case  of  systems  of  equations  that  describe  circuits,  charge  is  a  physical  constant.  To 
show  this,  consider  surrounding  arbitrary  circuit  by  a  Guassian  surface.  Since  the  surface  is 
unpunctured,  the  charge  contained  inside  must  remain  constant[43].  As  a  consequence,  the  sum  of 
all  the  currents  must  be  zero,  as  the  sum  of  the  currents  is  the  derivative  with  respect  to  /  of  the  sum 
of  the  charge. 

This  truly  trivial  observation  can  not  directly  apply  to  the  differential  equation  systems  con¬ 
structed  using  nodal  analysis  as  above.  If  the  sum  of  the  node  charges  in  Eqn.  (2.4)  were  precisely 
zero,  then  C(x,u)  in  Eqn.  (2.2)  would  be  singular  and  Eqn.  (2.2)  would  not  necessarily  have  a  unique 
solution.  In  order  to  produce  systems  of  equations  that  do  have  unique  solutions,  the  KCL  equations 
for  an  arbitrary  reference  node  and  for  nodes  for  which  the  voltages  are  given  a  priori  are  not  in¬ 
cluded,  and  a  solution  for  the  reference  node  of  vn/(t)  -  0  for  all  /  is  assumed. 

As  an  example,  consider  the  simple  resistor-capacitor  circuit  of  Fig.  2.6a.  In  terms  of  charges, 
the  differential  equation  that  describes  the  behavior  of  the  circuit  is 

q(v(0)  -  -  gv(t)  v(0)  -  1.0, 

where  the  charge  q(v(t))  —  cv(r).  The  solution,  v(/)  ■»  e~T',  is  not  a  constant,  so  neither  is  the 
charge  q.  The  differential  equation  does  not  exhibit  charge  conservation  because  not  all  the  charges 
have  been  considered,  and  only  the  sum  remains  constant.  The  charge  on  the  ground  or  reference 
node  is  -  cv(r)  and  obviously  the  sum  of  the  two  is  zero  for  all  t. 

If  KCL  is  applied  to  every  node  in  the  resistor-capacitor  example,  including  the  reference  node, 
an  appended  system  is  generated 

*/)  -  v/r) - (v(0  - 

M')-*')  — -f- (V/O  -  v«) 
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which  does  not  have  a  unique  solution,  but  an  infinite  collection,  unless  it  is  assumed  v(')  “  °- 
However,  for  any  of  the  solutions  the  sum  of  the  node  charges  remains  constant 

It  is  possible  to  use  appended  systems  generated  by  applying  KCL  to  every  node  in  a  circuit  to 
test  how  well  a  numerical  integration  method  conserves  charge.  If  the  method  is  applied  to  the  ap¬ 
pended  system  then  charge  conservation  can  be  checked  by  summing  all  the  charges  at  each  timestep 
to  insure  the  sum  remains  constant  The  algebraic  equations  generated  by  the  numerical  integration 
method  can  still  be  solved  in  the  usual  fashion,  with  the  known  node  voltages  and  a  reference  voltage 
used  to  eliminate  the  equations  associated  with  the  appended  differential  equations. 

Explicit-Euler  applied  to  an  autonomous  system  (independent  of  u(t)  )  of  the  form  of  Eqn. 
(2.2)  constructed  from  applying  KCL  to  every  node  in  the  circuit  yields. 


^<v(Tm))(v(rJII+  j)-v(rm))  -  Am+1/(0(rm)) 


dq  a  a 

where  — r-(y(rm))  is  the,  possibly  singular,  jacobian  of  q(v(rm))  ,  the  vector  of  all  the  node  charges. 
dv 

m  a 

If  it  is  assumed  that  at  rm  the  sum  of  the  node  charges  2q,(v  (t„))  —  K ,  where  K  is  some  constant, 

i«l 

"  A 

then  charge  is  conserved  only  if  2  <?,(  v  (t„+1)  is  also  equal  to  K.  This  is  not  necessarily  the  case,  as 
can  be  seen  from  the  Taylor  series  expansion  of  q(v  (tm+1))  about  q(v(r  J), 


q(v(T„,+  1))  -  q(?(rm))  -I-  -^-(?(Tni))(J(T„I+j)  -  £(t„,)) 


[2.10] 


+  -^-f-(v(?))(  v(t„,+  ])  -  v(rm)  )(  v(rni+,)  -  ?(rnl)  ) 
dv 


A  A  .A  v  A, 


dq 


A  >  _  A 


where  v(t)  e  [v (t „),  v (rM+1)].  Substituting  A,„+i/(v(tJ)  for—  (v(rj)(v (r„+1)  -  v(rj)  leads  to 
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d\ 

dv2 


(v(t))(v(t. 


/»!  + 


l)  “  v(tb 


,))(V  (T, 


m+ 


l)  -  V-(Tm)) 


Summing  the  node  charges. 


n  n  n 

S?/(»'(T»».+  l))  “  +  °(hl+0-  [2-11] 

i«i  ;=i  i-i 

I o(a)  | 

where  o(  •  )  is  any  function  such  that  Iim„,0 - - -  <  <*.  To  simplify  Eqn.  (2. 11),  another  prop¬ 

erty  of  the  original  network  from  which  the  KCL  equations  were  generated  can  be  used.  Since 
•  )  is  the  vector  of  sums  of  the  currents  incident  at  each  node,  and  as  any  current  leaving  a  node 

«  A 

must  arrive  at  some  other  node,  X /( v  (rn))  must  be  identically  zero.  Using  this  fact  leads  to 


i>,<v(T„+1»  -  K  +  0(k2+l), 

i— 1 

which  implies  that  the  sum  of  the  node  charges  will  not  remain  constant  unless  the  second  order  term 
in  Eqn.  (2.10)  is  zero,  which  will  be  true  if  all  the  node  charges  are  linear  functions  of  the  node 
voltages,  but  will  not  be  true  in  general. 

The  sum  of  the  charges  is  constant  in  the  limit  as  /t„+,  goes  to  zero,  so  the  nonconstant  charge 
can  be  viewed  as  another  measure  of  the  local  truncation  error.  However,  if  the  same  integration 
method  is  applied  slightly  differently,  using  charge  as  a  state  variable,  then  the  sum  of  the  node 
charges  will  stay  constant  regardless  of  the  stepsize.  To  demonstrate  this  we  again  apply  the 
explicit-Euler  algorithm,  but  to  the  system  in  the  form  of  an  autonomous  version  of  Eqn.  (2.4). 
Discretizing  the  charge  function  leads  to. 


?(v(t„+i»  -?(v(t„))  -  A„+i/Tv(t„). 
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That  the  sum  of  charge  is  constant,  independent  of  the  stepsize,  follows  from: 


n  n  n 

“  S?i(y<Tii«»  +  2X.+  i/Kv<t/h)) 

l«  1  i«  1  /a  1 


«  A 

and  the  fact,  mentioned  above,  that  2 ^(v  (t„))  —  0. 

We  use  these  ideas  to  precisely  define  the  charge  conservation  property. 

Definirion  2.5:  A  system  of  the  form  of  Eqn.  (2.4)  is  of  type  S  if  it  has  the  following  two  properties: 

m  ft 

for  any  exact  solution  the  sum  2<j(*(0)  is  a  constant  independent  of  t ;  and  2/](v)  -■  0  for  any 
v  e  R".  A  numerical  integration  method  has  the  charge  conservation  property  if  when  applied  to  any 

A  " 

system  of  type  S,  the  computed  sequence  {v  (t,)}  is  such  that  1q^v(Tm))  is  a  constant  independent 

1-1 

of  m.  ■ 

In  section  3.1  we  will  show  that  all  multistep  integration  methods  applied  with  charge  as  the  state 
variable  have  the  charge  conservation  property. 

SECTION  2.2.4  -  DOMAIN  OF  DEPENDENCE 

In  the  area  of  partial  differential  equations,  the  concept  of  domain  of  dependence  is 
we!l-known[44].  The  idea  is  that  partial  differential  equations  can  be  characterized  by  how  rapidly 
the  behavior  of  points  in  space  will  propagate  with  time.  As  time  increases,  the  space  of  points  that 
can  effect  a  given  point,  referred  to  as  the  given  point’s  domain  of  dependence,  grows.  For  a  nu¬ 
merical  method  used  to  solve  the  partial  differential  equation  to  be  convergent,  that  is  to  produce 
arbitrarily  accurate  solutions  as  the  distance  between  discretization  points  becomes  small,  the  nu¬ 
merical  method  must  propagate  the  behavior  of  each  point  in  space  at  a  rate  that  at  least  approaches 
the  rate  of  the  original  partial  differential  equation.  In  the  language  of  domain  of  dependence,  a  nu¬ 
merical  method  is  convergent  only  if  for  each  point  in  space,  as  the  distance  between  discretization 
points  become  small  the  numerical  domain  of  dependence  includes,  or  comes  arbitrarily  close  to 
covering,  the  domain  of  dependence  of  the  partial  differential  system. 
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In  this  section,  an  analogous  concept  will  be  introduced  for  large  systems  of  ordinary  differ¬ 
ential  equations.  But  rather  than  comparing  the  domain  of  dependence  of  a  numerical  method  to  that 
of  the  differential  equation  system  to  investigate  the  numerical  method’s  convergence  properties,  we 
will  show  that  domain  of  dependence  plays  a  role  in  the  accuracy  of  the  integration  method,  and  how 
well  the  errors  due  to  discretization  can  be  controlled. 

Consider  the  following  differential  equation  system 

-  -  (*,M  -  0.01u(/) )  [2.12a] 

X2U)  -  -  (x2(0  -  lQxj(r) ) 


*„U)  -  -  (x„(f)  -  10x„_j(r) ). 

x,(0)  -  0,  «'«  [I,...,  n] 


where  the  input  u(t)  -  1  for  all  t  >  0. 

The  exact  solution  for  this  system  is: 


/-l  j 

*,<r)  -  10'-3[1  -  [2.126] 

As  can  be  seen  by  examining  Eqn.  (2. 12b),  the  solution  to  the  system  of  Eqn.  (2. 12a)  is  a  propagating 
step  that  is  being  smoothed  and  is  growing  rapidly  in  amplitude  through  n  stages.  Systems  with  this 
type  of  behavior  are  extremely  common  among  circuit  examples  (a  chain  of  inverters,  for  example). 
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If  the  explicit-Euier  algorithm  is  applied  to  Eqn.  (2.12a),  the  computed  value  for 
x,(t,)  -  0.01/r,  and^(r,)  -  0  for  all  1  <  i  <  n.  In  fact,  x,  will  remain  zero  until  the  /'*  timestep 
regardless  of  the  size  of  the  timestep.  This  slow  propagation  of  the  solution  introduces  an  error  that 
is  in  the  form  of  a  delay,  that  is  x, (tj)  does  not  change  until  j  >  i.  Explicit-Euier  is  convergent,  so  this 
delay  error  in  time  must  be  driven  to  zero  as  the  timestep  decreases,  and  it  does,  because  x;  ap¬ 
proaches  zero. 

lO'-Vt, 

If  implicit-Euler  is  applied  to  Eqn  (2.12a),  then  x,(r,)  —  - .  Therefore,  when  the 

(1  +  f>i)‘ 

implicit-Euler  algorithm  is  used,  the  behavior  of  the  input  is  propagated  thoughout  the  entire  system 
in  one  timestep  and  there  is  no  error  due  to  delayed  propagation  of  information.  This  does  not  nec¬ 
essarily  imply  that  implicit-Euler  is  more  accurate  than  the  explicit-Euier  algorithm.  For  example, 
applied  to  Eqn.  (2.12a)  with  a  timestep  ht  —  1,  explicit-Euier  produces  the  solution  x5(x,)  -  0.0, 
while  implicit-Euler  produces  the  solution  Xj(t,)  -  3.125.  The  exact  solution  is  x5(l)  -  0.359,  so 
in  this  case,  the  explicit-Euier  computed  solution  is  closer  to  the  exact  solution  than  the  implicit-Euler 
computed  solution,  though  neither  method  produces  very  accurate  results. 

For  this  example,  accuracy  clearly  isn’t  the  reason  for  preferring  the  implicit-Euler’s  rapid 
propagation  of  information  to  explicit-Euier.  Implicit-Euler  is  a  more  reliable  integration  method  for 
this  example  because  the  error  due  to  discretization  in  the  computed  solution  is  more  visible  than  the 
discretization  error  in  the  computed  solution  produced  by  the  explicit-Euier  algorithm.  To  see  why 
this  is  the  case,  consider  the  local  truncation  error  estimate  presented  in  Section  2.2.1, 


*(*„,) -*(Tm—  l) 


lte*  h2m+r 


VI 


(^m+l  +  hm) 


[2.13] 


Since  in  this  case,  m  —  0,  x(t„)  -  *(•!■„,_,)  —  x(0)  and  hm  —  0,  Eqn.  (2.13)  can  be  simplified  to 


LTE=  hl+1(xl  r,)-Ar(0)) 


Page  26 


For  explicit-Euler  this  estimate  indicates  that  the  LTE  for  xj(Tt)  is  zero,  which  is  a  severe  underesti¬ 
mate.  A  timestep  control  scheme  based  on  local  truncation  error  would  not  shrink  the  timestep  in  this 
case,  and  a  very  inaccurate  solution  would  be  computed.  For  the  implicit-Euler  algorithm,  the  error 
estimate  is  3. 125  which  is  larger  than  the  actual  LTE,  but  this  is  safe,  because  an  LTE-based  timestep 
control  scheme  will  detect  the  error  and  reduce  the  timestep. 

This  example  indicates  that  when  applying  the  explicit-Euler  algorithm  to  a  large  system,  a 
timestep  dependent  limit  is  introduced  on  how  fast  the  behavior  of  an  individual  state  variables 
propagate  through  the  system.  The  delay  error  due  to  this  limited  rate  of  propagation  is  different 
from  a  local  truncation  error.  An  arbitrarily  high  order  explicit  multistep  integration  method  could 
have  been  used  at  each  step,  and  still  x,(r„)  would  have  been  zero  until  the  ?*  timestep.  The 
implicit-Euler  algorithm  does  not  introduce  such  an  a  priori  limitation  on  how  fast  the  behavior  of  an 
individual  state  variables  propagate  through  the  system.  Because  of  this,  when  the  system  behavior 
is  faster  than  can  be  propagated  by  the  explicit-Euler  algorithm,  the  implicit-Euler  algorithm  can 
produce  more  accurate  results,  but  more  importantly,  when  it  produces  results  that  are  in  error,  those 
errors  are  more  observable. 

We  end  this  section,  and  this  chapter,  by  connecting  the  concept  of  the  delay  introduced  by  an 
integration  method,  the  numerical  delay  to  that  of  Domain  of  Dependence,  the  concept  borrowed  from 
the  study  of  partial  differential  equations.  This  connection  will  provide  a  simple  tool  for  testing  in¬ 
tegration  methods  to  determine  for  what  type  of  systems  they  will  introduce  numerical  delay. 

For  this  purpose,  we  can  define  the  numerical  delay  as  follows: 

Definition  2.6;  Given  a  numerical  integration  method  applied  to  a  system  of  the  form  x(t)  —  Ax(t) 
with  some  initial  condition  x(0)  —  if  x,(t)  —  x,(0)  ^  0  for  all  t  e  (0,  r]  for  some  t  >  0,  then 
the  numerical  delay  to  the  variable  is  defined  as  the  smallest  integer  M,  such  that 

A 

x(ta/. +1)  —  xf(0)  0.  If  no  such  r  exists,  the  numerical  delay  to  the  /'*  variable,  Mi ,  is  zero.  The 

numerical  delay  for  the  integration  method  applied  to  the  given  system  with  the  given  initial  condition 
is  the  maximum  over  all  i  of  the  Mp  ■ 
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In  the  example  above,  the  numerical  delay  for  implicit-Euler  algorithm  applied  to  Eqn.  (2.12a)  is  zero, 
and  the  numerical  delay  for  explicit-Euler  is  n  —  1. 

The  description  of  the  role  of  domain  of  dependence  will  be  based  on  the  following  general 
definition: 

Definition  2.7:  Given  an  equation  of  the  form.y  -  f(x),  where  xy  e  IR" ,  and  /:!R"xlR"  ■*  IR" ,  the 

domain  of  dependence  of  the /*  variable  of  the  vector  .y,  y} ,  is  the  set  of  all  x, ,  i  e  {1,...,  n}  such  that 
df. 

for  some  x,  — —  #  0  ■  . 

dx, 

Given  the  matrix  test  problem 

x{t)  -  Ax(t)  x(0)  -  x0  (2.14] 

where  x(t)  e  IR"  and  A  e  IR™,  the  exact  solution  at  /  ■  A  is,  in  series  form, 

x(h)  -  [I  +  kA+  4r^2  +  +  — ] Jf(0).  [2.15] 

z  o 

The  domain  of  dependence  of  xfji)  can  be  deduced  directly  from  Eqn.  (2.15).  The  variable  x,(0)  is 
in  the  domain  of  dependence  of  x,(A)  if  the  i,/*  element  of  A"  is  nonzero  for  some  n. 

The  equation  for  one  step  of  explict-Euler  applied  to  Eqn.  (2.15)  is 

Jc  (tj)  -  [/  +  Aj/l]x(0).  [2.16] 

As  can  be  seen  from  the  equation,  the  domain  of  dependence  for  the  x'*  variable  in  Eqn.  (2.16)  will 
be  a  proper  subset  of  the  domain  of  dependence  for  the  X1*  variable  in  Eqn.  (2.15)  unless  the  powers 
of  the  matrix  A  do  not  add  additional  nonzero  terms.  This  would  occur,  for  example,  in  the  case 
where  A  is  diagonal.  If  instead,  one  step  of  implicit-Euler  were  applied  to  Eqn.  (2.14),  the  following 
series  expansion  results: 


£(t,)  -  [/  +  hA  +  h2A2  +  h3A3  +  ...]  x(0), 


[2.17] 
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where  the  series  expansion  is  valid  for  h  such  that  hp  <  1  where  p  is  the  spectral  radius  of  A.  Com¬ 
paring  Eqn.  (2. 17)  to  Eqn.  (2. 15),  it  can  be  seen  that  for  a  small  enough  h  the  domains  of  dependence 
of  the  exact  solution  and  the  implicit-Euler  algorithm  are  identical  for  each  variable,  xftt).  We  define 
this  property  below  as  exhaustive  domain  of  dependence. 

Definition  2.8;  If  the  domain  of  dependence  of  each  element  of  the  vector  produced  by  one  step  of 
an  integration  method  applied  to  Eqn.  (2.14)  matches  the  domain  of  dependence  of  the  correspond¬ 
ing  element  in  the  left  hand  side  vector  of  Eqn.  (2.15)  for  a  small  enough  timestep  h  and  for  any  A 
and  any  initial  condition  Xq,  then  the  numerical  method  is  said  to  have  an  exhaustive  domain  of  de¬ 
pendence.  ■ 

The  following  theorem  relating  domain  of  dependence  to  numerical  delay  follows  directly  from 
the  definitions: 

Theorem  2.2:  If  a  numerical  integration  method  has  an  exhaustive  domain  of  dependence  then  the 
numerical  delay  of  the  integration  method  is  zero  for  any  A  and  any  xq.  ■ 

If  one  step  of  a  numerical  method  has  a  smaller  domain  of  dependence  than  the  original  dif¬ 
ferential  equation,  then  a  numerical  delay  will  be  introduced  and  the  timesteps  used  for  the  calculation 
will  have  to  be  bounded  to  insure  rapid  enough  propagation  of  variable  behavior.  Like  bounds  on  the 
timestep  to  insure  stability  for  non-A-stable  methods,  this  additional  constraint  is  difficult  to  estimate, 
and  must  be  done  very  conservatively.  The  explict-Euler  example  above  demonstrates  how  difficult 
the  error  is  to  even  observe,  because  the  effected  variables,  for  which  the  error  occurs,  are  left  un¬ 
perturbed.  For  this  reason,  a  robust  numerical  integration  algorithm  for  large  systems  must  either  use 
a  method  like  implicit-Euler,  which  has  an  exhaustive  domain  of  dependence,  or  have  some  technique 
for  checking  that  system  variables  have  propagated  far  enough. 


Figure  2.6  -  Stiff  Resistor-Capacitor  Circuit 
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CHAPTER  3  -  NUMERICAL  TECHNIQUES 

The  implicit  multistep  integration  algorithms  used  in  general  purpose  circuit  simulation  pro¬ 
grams  like  SPICE2  [2]  and  ASTAP[3]  have  proved  to  be  extremely  reliable,  but  are  computationally 
expensive  when  applied  to  large  systems.  This  is  because  each  step  of  the  numerical  integration  re¬ 
quires  the  solution  of  a  large  implicit  nonlinear  algebraic  system.  Two  approaches  have  been  used  to 
reduce  the  computation  time  required  by  these  methods.  Decomposition  techniques  have  been  ap¬ 
plied  to  improve  the  efficiency  of  the  solution  of  the  large  algebraic  systems  generated  by  implicit 
integration  methods,  and  less  computationally  demanding  semi-implicit  numerical  integration  algo¬ 
rithms  have  been  developed.  In  this  chapter  we  will  start  by  demonstrating  that  the  implicit  multistep 
integration  algorithms  used  in  general  purpose  circuit  simulation  programs  have  the  three  key  prop¬ 
erties  described  in  Chapter  2,  charge  conservation,  exhaustive  domain  of  dependence  and  stiff  stability. 
Following,  the  relaxation  algorithms  that  have  been  used  in  circuit  simulators  for  solving  the  large 
nonlinear  algebraic  systems  generated  by  implicit  integration  methods  will  be  described.  Then  the 
semi-implicit  integration  methods  used  in  special  purpose  programs  like  MOTIS[7],  MOTIS2[8],  and 
SPLICE[45]  will  then  be  analyzed  with  respect  to  their  domain  of  dependence  and  stability  proper¬ 
ties.  Finally,  we  will  end  this  chapter  by  comparing  some  of  the  special  purpose  integration  algorithms 
with  algebraic  relaxation  methods. 

SECTION  3.1  -  NUMERICAL  INTEGRATION  IN  GENERAL-PURPOSE  SIMULATORS 

Most  of  the  general-purpose  circuit  simulation  programs  use  implicit  multistep  integration  al¬ 
gorithms  applied  to  the  state  variable  charge  (and  if  inductances  are  included,  also  fluxes).  That  is, 
given  a  system  of  the  form 


q(x(t),  u(t))  - 


[3.1] 
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where  x(t)  e  IR"  is  the  system  state,  usually  the  vector  of  node  voltages  appended  by  inductor  cur¬ 
rents,  v(/)  e  JR'  ,  is  the  vector  of  inputs,  and  is  continuously  differentiable  with  respect  to  t, 
f:TR"xTR!  -»  IR" ,  continously  differentiable,  is  usually  the  vector  of  sums  of  currents  entering  a  node, 
and  ^IR'TlR'  -»  IR"  ,  continously  differentiable,  is  usually  the  vector  of  node  charges  or  fluxes.  A 

A  A  A 

function,  /  is  defined  such  that  / (q(x(t)),u(t))  f{x(t),u(t)).  Using  such  an /,  Eqn.  (3.1)  is  con¬ 

verted  to  a  system  in  normal  form, 

q(x(t),uU))  -  /(9W/)),«(0).  [3.2] 

One  of  the  collection  of  multistep  integration  methods  is  then  used  to  solve  Eqn.  (3.2).  The  general 
form  for  a  multistep  integration  method  applied  to  Eqn.  (3.2)  is 

k  I  A 

2ai^*<T«-f)»“(T*-/))  “  (%.-/))•  [3-3] 

i-0  1-0 


which  is  identical  to 


*  A  /  A 

q(X  (Xm—i)’ U(T/ II—/))  *■  hm^PjPJXx  (Tff|_j),  w(  !„,_,))  [3.4] 

j— 0  1—0 

where  Ac,/  are  postive  integers,  a0  -  1  ,  and  a„  0,  e  IR  for  0  <  i  <  Ac,  0  <  j  <  /  depend  on  the  inte¬ 
gration  method  and  the  ratio  of  the  timesteps  At„  m  —  max(Ac,/)  <  i  <  m.  For  example,  the  fixed- 
timestep  explicit-Euler  algorithm  used  for  examples  in  Chapter  2  can  be  derived  from  Eqn.  (3.4)  by 
setting  Ac  •-  1,  /  —  1,  a0  ■■  1  ,  a]  «  —  1  ,  /)0  —  0,  and  /?,  —  1.  To  derive  implicit-Euler  the  coef¬ 
ficients  remain  the  same  except  /J0  —  1 ,  and  /),  —  0. 

Not  all  collections  of  a 's  and  p’s  produce  useful  numerical  integration  methods.  Consistency 
is  one  limitation  on  the  choice  of  coefficients.  It  is  well  known  that  for  a  multistep  method  to  be 


consistent. 
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k 


2 


i=i 


- 1 


and 


k  I 


2ia< +  2ft 


0 


where  it  is  assumed  that  a0  «  1[1].  In  addition,  if  0O  —  0  the  integration  method  is  said  to  be  ex¬ 
plicit,  otherwise,  the  method  is  implicit. 

When  a  multistep  method  is  applied  to  a  system  of  the  form  of  Eqn.  (3.2),  the  state  at  the  m‘h 
step,  x(rm) ,  is  computed  by  solving 

tKx{rm),u(rm))  +  hnfiofix  (rm),  u(Tm))  +  [3.5] 

k  I 

(=1  i=l 


for  x  (t„)  given  x  (ry) ,  ?<x (ry),  tf(r,)),  and  /U(t,),  t/(r,))  for  ally  <  m. 

Implicit  nonlinear  algebraic  systems  generated  by  integration  methods  are  usually  solved  using 
the  iterative  Newton-Raphson(NR)  method.  The  NR  algorithm  is  used  because  it  is  guaranteed  to 
converge  if  the  initial  guess  is  close  enough  to  the  exact  solution.  From  this  observation  it  follows  that 
as  the  exact  solution  to  the  differential  equation  is  a  continous  function,  it  is  possible  to  pick  a 
timestep  small  enough  to  insure  the  NR  algorithm  will  converge.  Also,  the  NR  algorithm  will  con¬ 
verge  independent  of  the  stiffness  of  the  system,  which  follows  from  the  observation  that  the  NR  al¬ 
gorithm  will  solve  a  linear  problem  exactly  in  one  step. 
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The  general  Newton-Raphson  iteration  equation  to  solve  F(x)  —  0  where  x  •  lRn  and 
F:1R"  R"  is 

Jjr(xk)  (xk  -  JT*-1)  -  -  Fix *_1)  [3.6] 

where  JF  is  the  jacobian  of  F  with  respect  to  x.  The  iteration  is  continued  until  ||.v*  —  Jr*-1  II  <  e  and 
F(x *)  is  close  enough  to  0.  If  the  Newton  algorithm  is  used  to  solve  Eqn.  (3.5)  for  x(rm),  the  residue 
at  the  k,h  step,  F{: c*(tih))1  is 

-  q(xk{ rm),  u{ rj)  +  ^xk(rm),  u(rj)  +  [3.7] 

*  / 

i-1  i-l 

and  the  Jacobian  JF(x*(rm))  is 

Mxk(-Tn,))  “  M(Tm»  “  t3-8! 

Then  is  derived  from  x*(rm)  by  solving  the  linear  system  of  equations 

Jf{xk(TnJ)  [x*+1(rm)  -  **(Tm)]  -  -F(xk(rm))  [3.9] 

The  Newton  iteration  is  continued  until  sufficient  convergence  is  achieved,  that  is 
|Jr*+,(0  —  x*(t„)  ||  <  e  and  F(xk(Tm))  is  close  enough  to  zero. 

Note  that  here,  even  if  the  integration  algorithm  is  explicit  (  /30  —  0),  Eqn.  (3.5)  will  still  be 
an  implicit  algebraic  problem  with  respect  to  x  (r„).  This  occurs  because  the  multistep  algorithm  was 
applied  using  charge  as  a  state  variable,  and  charge  is  a  nonlinear  function  of  x. 
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One  of  the  important  reasons  for  applying  the  integration  method  to  the  system  in  the  form  of 
Eqn.  (3.2)  is  that  the  charge  conservation  property  of  Definition  2.5  holds  for  any  consistent  multi- 
step  method. 

Theorem  3.1 :  Any  consistent  multistep  method  of  the  form  of  Eqn.  (3.3)  has  the  charge  conservation 
property.  ■ 

Proof  of  Theorem  3.1 

Let  the  system  of  Eqn.  (3.1)  be  of  type  S,  as  given  in  Definition  2.5.  To  show  charge  conser¬ 
vation,  the  vector  elements  in  Eqn.  (3.4)  are  summed  to  form 

SSaA^(T».-Au(T«-y))  *  [3.8] 

i-  ly-o  i-i  jm o 


Interchanging  summations  yields 

k  n  a  In 

jm  0  I—  1  jm  0  1-1 

"  A 

Since  the  original  system  is  of  type  S,  2/)(jc  (Tm_y),«/(T»,_7))  —  0  .  Substituting  into  Eqn.  (3  9)  and 
using  that  <*0  ■»  1, 


k  n 


^,<x  (Tm)«  tf(T  J)  “  “  l 

im  1  jm  1  is  1 


[3.10] 


Assuming  that  charge  has  been  conserved  up  to  the  m'*  step  (r  ))  -  K  for  j  <  m.  Then  as 

f-1  1 

k 

2a  ~  - 1  because  the  method  is  assumed  consistent, 

j- »  1 
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2?,<X<T«).  w<t«» 


1*1 


K, 


[3.11] 


which  proves  the  theorem  ■. 

Exhaustive  domain  of  dependence  is  also  easy  to  show  for  most  implicit  multistep  methods. 

Theorem  3.2:  Any  implicit  multistep  method  with  a,  ^  0  has  an  exhaustive  domain  of  dependence 
when  applied  to  a  system  of  the  form  i(f)  ■■  Ax(t),  where  x(t)  e  IR" ,  and  A  e 

Proof  of  Theorem  3.2 

The  general  form  for  a  multistep  method  applied  to  x(t)  —  Ax(t)  is 


2v  (Tm—i)  “ 
i*0  i*0 


[3-12] 


Reorganizing  and  using  the  fact  that  a0  —  1, 


Since  the  method  is  implicit,  /20  #  0,  and  for  small  hm  [I  —  hJ3oA]-1  can  be  expanded  to  yield: 

x(Tm)  *■  V  +  hmPi)A  +  +  -g-(^niA>^)3  +  —  1  [“l  —  P\A]x  (tb,_j)  + 


[7  -  h„M 


-i 


k  I 

i=2  i=  2 


[3.13] 
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Following  the  same  argument  as  presented  in  Section  2.2.4,  the  variable  is  in  the  domain  of 

dependence  of  Xj(rm)  if  the  ij,k  element  of  A"  is  nonzero  for  some  n,  which  matches  the  differential 
equation  and  therefore  proves  the  theorem.  ■ 

The  general  question  of  the  region  of  stability  for  multistep  integration  methods  has  received 
considerable  attention[ 1,42,46]  and  the  wealth  of  material  on  this  question  will  not  be  reproduced 
here.  Instead,  we  will  mention  the  results  that  are  most  critical  for  circuit  simulation  applications. 
Perhaps  the  most  important  result  is  that  there  are  no  A-stable  multistep  integration  methods  whose 
local  truncation  error  is  of  order  higher  than  A3.  This  is  known  as  the  Dahlquist  barrier[42].  For  this 
reason,  the  program  SPICE[2]  uses  a  combination  of  the  implicit-Euler  mentioned  in  Chapter  2  and 
the  trapezoidal  rule  (corresponding  to  a0  -  1  ,  a,  -  -1,  /?0  -  0.5  ,  /?,  -  0.5  )  and  as  a  user  op¬ 
tion,  can  also  use  the  variable-order  (up  to  six)  backward-difference  methods[l].  The  program 
ASTAP[3]  uses  the  variable-order  backward-difference  methods.  The  first  and  second  order 
backward-difference  methods  are  A-stable,  but  the  higher  order  backward-difference  integration 
methods  are  only  stiffly  stable.  By  this,  it  is  meant  that  the  region  of  stability  for  these  methods  in¬ 
clude  the  real  line  in  the  open  left-half  plane  of  $  and  some  sections  in  the  open  left-half  plane  about 
the  real  line[l]. 

SECTION  3.2  -  RELAXATION  DECOMPOSITION 

As  mentioned  above,  the  implicit  multistep  integration  methods  used  in  all  the  general-purpose 
circuit  simulation  programs  require  solving  an  implicit  system  of  nonlinear  algebraic  equations  at  each 
timestep.  The  algebraic  system  is  usually  cast  into  the  form  F(x)  —  0  where  F:  IR"  -*•  1R",  and 
x  e  IR",  which  is  then  solved  using  the  iterative  Newton-Raphson(NR)  algorithm  as  in  Eqn.  (3.6). 

The  computation  of  the  Newton  iterates  can  be  viewed  as  two  pieces,  evaluating  the  function 
F,  and  its  Jacobian  Jf,  and  performing  a  matrix  solution.  The  computational  cost  of  performing  the 
matrix  solution  grows  superlinearly  with  the  size  of  the  problem,  as  n «,  where  n  the  number  of 
equations  in  the  system  and  a  >  1.  Circuit  simulation  programs  are  intended  to  handle  large  circuits. 
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and  as  the  Jacobian  matrices  are  sparse,  sparse  matrix  techniques[40]  are  used  to  keep  a  as  low  as 
possible.  It  has  been  empirically  observed  that  the  time  to  perform  a  sparse  matrix  solution  grows  as 
1.2  <  a  <  1.4  for  the  matrices  associated  with  circuit  simulation  problems.  The  computational  cost 
of  a  function  evaluation  grows  linearly  with  the  size  of  problem,  but  for  circuit  simulation  problems, 
the  evaluation  of  F  and  Jr  is  a  complicated  task.  For  each  element  (transistor,  capacitor,  resistor,  etc) 
in  the  circuit,  the  currents,  the  charges  and  their  derivatives  must  be  evaluated.  For  example,  the 
evaluation  of  the  currents  and  charges  associated  with  one  MOS  transistor  requires  more  than  a 
hundred  floating  point  operations. 

Because  the  computation  involved  in  calculating  each  transistor’s  charge  and  current  charac¬ 
teristic  is  much  more  complicated  than  the  simpler  operations  involved  in  the  matrix  solution,  for 
small  to  medium  sized  problems  the  function  evaluation  time  dominates  the  sparse  matrix  solution 
time.  It  is  only  when  the  problem  involves  more  than  several  thousand  equations  that  the  matrix 
solution  time  dominates.  For  this  reason,  the  most  useful  decomposition  techniques  applied  to  circuit 
simulation  problems  reduce  both  the  matrix  solution  time  and  the  function  evaluation  time. 

Two  approaches  to  decomposition  have  been  used  in  circuit  simulation  programs.  The  first, 
which  we  will  not  describe  in  detail  here,  is  refered  to  as  tearing  decomposition.  For  linear  equations, 
tearing  is  a  form  of  Block  LU  Factorization[4,  5,  47,  48,  49,  50].  Its  application  to  nonlinear  systems 
has  led  to  Multi-level  Newton  aigorithms[52].  The  second  approach,  closer  to  the  the  focus  of  this 
thesis,  has  been  to  apply  the  various  forms  of  the  iterative  relaxation-Newton  or  SOR-Newton 
algorithms[21,  53]. 

As  background  for  the  relaxation-Newton  algorithm,  we  will  will  present  an  extremely  brief 
description  of  the  Gauss-Jacobi  and  Gauss-Seidel  relaxation  methods  starting  with  the  algorithms  for 
linear  systems.  A  complete  discussion  can  be  found  in  [28]. 

The  linear  problem  Ax  —  b  —  0  where  x  —  (xt,...,xK)r,  b  —  (61,...,  b„)T  ,  xt,  5,  e  1R  ,  and 
A  —  (afJ) ,  A  e  1R"”  can  be  solved  exactly  using  gaussian  elimination  (with  pivoting)  given  A  is 
nonsingular.  For  matrices  with  certain  properties,  it  is  also  possible  to  solve  for  x  in  an  iterative 
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fashion,  where  each  step  of  the  iteration  involves  inverting  a  sequence  of  one-dimensional  problems. 
For  example,  there  is  the  Gauss-Jacobi  relaxation  algorithm 


Algorithm  3.1  (Gauss-Jacobi  Algorithm  for  solving  Ax  —  b  »  0) 

The  superscript  k  is  the  iteration  count  and  e  is  a  small  positive  number, 
k  -  0; 

Guess  some  x°. 
repeat { 

k  +-  k  +  1 

foreach  (it  +  2  J 

}  until  (  lx*  —  x*-1 1  <  e  ) 


The  Gauss-Seidel  relaxation  algorithm  is  very  similar,  and  can  be  generated  from  Algorithm  3.1  by 
altering  the  update  equation  for  xf  to 


2 

i»i+ 1 


k- 1 


]• 


The  Gauss-Jacobi  algorithm  can  be  written  in  matrix  form  as 


Dxk  +  (L  +  LOx*-1  -  b 


and  the  Gauss-Seidel  algorithm  can  be  written  in  matrix  form  as 

(L  +  D)xk  +  Uxk~l  -  b 


where  LJ),U  e  1R'"  are  strictly  lower  triangular,  diagonal,  and  strictly  upper  triangular  respectively, 
and  are  such  that  A  —  L  +  D  +  U.  Taking  the  difference  between  k  and  k  —  1  iteration  we  get 

x*  -  X**1  -  D~l(L  +  C/Xx1-1  -  x*), 


for  Gauss-Jacobi,  and 
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xk-xk~i  -  (L  +  D)-'U(xk~l  -xk). 


for  Gauss-Seidel.  It  follows  that  the  Gauss-Jacobi  relaxation  algorithm  will  converge  if  the  spectral 
radius  of  D~l(L  +  U)  is  inside  the  unit  circle  and  Gauss-Seidel  relaxation  algorithm  will  converge  if 
the  spectral  radius  of  (L  +  D)~x  U  is  inside  the  unit  circle.  This  will  be  true,  for  example,  if  A  is  strictly 
diagonally  dominant  [28]. 

Now  consider  using  the  Gauss-Seidel  and  Gauss-Jacobi  relaxation  algorithms  to  solve  the 
nonlinear  system  F(x)  -  0  where  F(x)  —  (ft(x),...,  f„(x))r  ,  and  /:1R"  -*■  1R.  At  each  step  of  the 
relaxation,  the  x,  element  is  updated  by  solving  the  implicit  algebraic  equation. 


/■/v*  v*  rk  rk\  —  n 

/iv*| »•••*■* #—  1*  » ■*14.  !»•••» 


[3.15a] 


for  the  Gauss-Jacobi,  and 


[3.156] 


for  Gauss-SeideL 

It  is  possible  to  use  the  Newton-Raphson  algorithm  to  accurately  solve  the  implicit  algebraic 
systems  of  Eqa  (3.15a)  and  Eqn.  (3.15b)  at  each  step,  but  this  is  not  essential.  That  is,  it  has  been 
shown  that  the  asymptotic  rate  of  convergence  of  the  nonlinear  relaxation  is  not  reduced  if  rather 
than  solving  the  implicit  algebraic  systems  at  each  step,  only  one  iteration  of  the  Newton  method  is 
used[21].  The  algorithms  so  generated  are  referred  to  as  the  relaxation-Newton  methods.  The 
Gauss-Jacobi-Newton  algorithm  for  solving  systems  of  the  form  of  Eqn.  (3.14)  is 


.*+1 


m*k) 

dxi 


[3.16a] 


and  the  Gauss-Seidel-Newton  algorithm  is 
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-  4  -  [3164] 

where  jr**1’1  -  Cxf*1,...,  xf,...,  xj)7". 

There  is  the  following  general  theorem  about  the  local  convergence  of  relaxation-Newton 
methods. 

Theorem  3.3;  If  a  given  F:IR"  •*  IR"  is  continously  differentiable,  and  if  there  exists  an  x  c  R"  such 
that  F( x)  -  0,  then  if  the  Jacobian  of  F  at  x,  Jf(x),  is  strictly  diagonally  dominant  there  exist  some 
8  >  0  such  that  both  the  Gauss-Jacobi-Newton  or  the  Gauss-Seidel-Newton  iterations  applied  to  F 
will  converge  for  any  x°  for  which  ||  x0  —  x  ||  <  £.■ 

The  proof  of  the  above  well-known  theorem  can  be  found  in  the  references[21].  As  a  direct  conse¬ 
quence,  we  have  the  following  theorem  for  the  nonlinear  algebriac  systems  generated  by  consistant 
multistep  integration  methods. 

Theorem  3.4;  Let  the  Gauss-Seidel-Newton  or  Gauss-Jacobi-Newton  relaxation  algorithm  be  used 

a  dq 

to  solve  for  x  (t„)  in  Eqn.  (3.5).  If  f{x,u)  is  continously  differentiable,  -~—(x,u)  is  strictly  diagonally 

OX 

dominant  uniformly  over  all  x,  and  x  (r„_,)  is  used  as  the  starting  point  for  the  relaxation,  then  there 
exists  an  h  such  that  for  all  hm  <  h  the  relaxation  will  converge  to  the  solution  of  Eqn.  (3.5).  ■ 

As  an  intuitive  explanation  for  why  Theorem  3.4  should  be  true,  and  why  nonconvergence  should 
ever  occur,  consider  impiicit-Euler  applied  to  Eqn.  (2.2)  with  C{x,u)  -  C,  where  C  is  strictly 
diagonally  dominant. 


Or(Tnl)  -  Cx(rm_ j)  +  hmf{x{rm),u( rj). 


[3.17] 
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In  the  limit  as  hm  ■*  «  ,  Eqn.  (3.17)  becomes  equivalent  to  solving  f[x(rm),  t/(rm))  -  0  forx(T„)  by 
relaxation.  Since  little  is  assumed  about  /  other  than  Lipschitz  continuity,  it  is  unlikely  that  this 
problem  can  be  solved,  in  general,  with  a  relaxation  method.  However,  in  the  limit  as  the  timestep 
becomes  small,  Eqn.  (3.17)  becomes 


Cx(rJ  -  b 

where  b  —  Cx( t„_|).  This  problem  can  be  solved  by  relaxation  because  C  is  strictly  diagonally 
dominant.  We  formalize  this  observation  in  the  proof  of  Theorem  3.4. 

Proof  of  Theorem  3.4: 

It  is  sufficient  to  show  that  the  system  of  Eqn.  (3.5)  will  satisfy  the  conditions  of  Theorem  3.3 

for  small  enough  hm .  The  Jacobian  for  the  function  defined  by  Eqn.  (3.5),  JF,  is  given  in  Eqn.  (3-8). 

That  Jr  is  strictly  diagonally  dominant  for  hm  small  enough  follows  directly  from  the  observation  that 

dq 

in  the  limit  as  ■*  0,  Jf  approaches  — —  which  is  a  strictly  diagonally  dominant  matrix  by  assump- 

OX 

tion.  That  x  (t„_,)  is  close  enough  to  the  solution  of  Eqn.  (3.5)  for  a  small  enough  h„  follows  from 
the  assumption  that  the  multistep  method  is  consistent.  Consistency  implies  x(t„)  ”  x (t„_,)  is  a 
solution  to  Eqn.  (3.5)  for  /t„  -  0  and  from  the  Lipschitz  continuity  of  q  and  f  which  imply  that 
x(rm)  is  a  continous  function  of  h„Wl. 

The  relaxation-Newton  methods  have  become  popular  for  solving  circuit  simulation  problems 
for  two  reasons.  The  first  is  that,  as  mentioned  in  Chapter  2,  for  a  broad  class  of  circuits  the 
capacitance  matrix  is  diagonally  dominant  and  therefore  the  relaxation-Newton  algorithms  are  guar¬ 
anteed  to  converge  if  the  timestep  is  made  small  enough.  They  are  unlike  the  standard  NR  methods 
in  that  the  timestep  required  is  not  truly  independent  of  the  problem  stiffness,  an  issue  which  will  be 
presented  more  thoroughly  at  the  end  of  this  chapter.  The  second  reason  for  the  popularity  of  the 
relaxation-Newton  methods  is  that  with  proper  application,  it  is  possible  to  both  avoid  matrix  sol¬ 
utions  and  reduce  the  computation  involved  in  function  evaluation.  As  the  system  Jacobian  is  sparse. 


Page  45 


the  i7*  component  of  the  function  F  defined  in  equation  3.5,  F, ,  will  be  a  function  of  only  a  few 
components  of  the  vector  x.  During  the  relaxation-Newton  process  this  sparsity  can  be  exploited  by 
noting  whether  or  not  the  components  of  x  on  which  Ft  depends  have  changed  significantly,  and  if 
none  of  them  have,  not  reevaluating  F,.  In  addition,  if  Ft  is  close  enough  to  0,  x*+l  will  be  equal  to  xf 
and  need  not  be  recomputed. 

If  implemented  as  described  above,  such  a  partial  evaluation  scheme  involves  substantial 
checking,  to  see  if  Ft  should  be  reevaluated.  This  checking  can  overwhelm  the  savings  dule  to  partial 
function  evaluation.  To  avoid  this,  practical  relaxation-Newton  algorithms  are  implemented  using  a 
selective  trace  technique[33]  that  simultaneously  determines  the  order  in  which  the  relaxation 
equations  are  solved  and  the  portion  of  the  function  that  must  be  recomputed. 

SECTION  3.3  -  SEMI-IMPLICIT  NUMERICAL  INTEGRATION  METHODS 

Although  certain  implicit  multistep  integration  methods  have  all  the  desirable  properties  de¬ 
scribed  in  Chapter  2,  they  are  computationally  expensive  when  applied  to  very  large  systems  partly 
because  each  timepoint  requires  a  large  matrix  solution.  Semi-implicit  integration  methods,  as  the 
name  implies,  are  constructed  to  be  as  implicit  as  possible  without  making  it  necessary  to  perform 
standard  matrix  solutions  to  compute  the  time  points.  In  this  section  we  will  discuss  three  semi- 
implicit  methods,  all  of  which  have  been  used  in  circuit  simulation  applications.  In  order  to  simplify 
the  presentation  of  these  algorithms,  they  will  be  considered  as  applied  to  the  following  test  problem, 

m  -  Ax(t)  *(0)  -  Xo  t3.18] 

where  x(t)  e  IR",  and  A  e  IR"".  The  properties  of  these  algorithms  with  respect  to  domain  of  de¬ 
pendence  and  stiff-stability  will  be  considered.  This  test  problem  is  too  simple  to  indicate  the  inte¬ 
gration  methods'  charge  conservation  properties,  and  that  issue  will  not  be  considered. 

The  simplest  of  the  semi-implicit  methods  is  the  following  mixture  of  explicit  and 


implicit-Eu!er[5,7]. 
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x{rm)  “  *(Tm- 1)  +  y^(T»,)  +  (L+  L0Jf(Tm_i)3  [3.19] 

where  L,D,U  e  3R""  are  strictly  lower  triangular,  diagonal,  and  strictly  upper  triangular  respectively, 
and  are  such  that  A  —  L  +  D  +  U.  Note  that  this  algorithm  is  identical  to  solving  the  algebraic 
equations  generated  by  implicit-Euler  applied  to  Eqn.  (3.18)  with  with  one  iteration  of  a  Oauss- 
Jacobi  relaxation  scheme,  and  therefore  the  algorithm  is  referred  to  as  the  Jacobi-semi-implicit 
method.  Solving  for  x(rm)  leads  to 

x(rm)  -  (I-  hjyr'u  +  hm(L  +  U) )  x(rm_0-  [3.20] 

Since  (/  -  hj))  is  diagonal,  its  inverse,  if  it  exists,  can  be  computed  trivially.  In  addition,  we  have 
the  following  stability  result  (See  [6]  for  similar  results). 

Theorem  3.5:  If  the  matrix  A  in  Eqn.  (3.18)  is  diagonally  dominant  with  negative  diagonal  entries, 
or  A  is  lower  or  upper  triangular,  then  the  region  of  stability  for  the  Jacobi-semi-implicit  method  is 
the  open  left-half  plane  of  £.■ 

This  theorem  is  of  practical  value  because  the  systems  of  differential  equations  that  describe  circuits 
with  resistors  and  grounded  capacitors  will  be  of  the  form  of  Eqn.  (3.18)  and  will  have  the  diagonal 
dominance  property. 


Proof  of  Theorem  3.5: 

To  prove  the  first  part  of  the  theorem  it  is  sufficient  to  show  that  the  matrix  M  defined  by 


M  -  (I-hmD)~l[I  +  hJL+U)] 


[3.22] 


has  a  spectral  radius  p(M)  <  1  if  A  is  diagonally  dominant  and  has  negative  diagonal  entries,  or  if 
A  is  upper  or  lower  triangular  and  has  its  eigenvalues  in  the  open  left-half  plane  of  (f.  If  A  is  upper 
or  lower  triangular,  the  eigenvalues  of  A  are  the  diagonal  entries,  which  must  be  negative  by  as¬ 
sumption.  If  A  is  triangular,  M  will  be  triangular,  and  the  eigenvalues  of  M  will  be  its  diagonal  entries. 

The  th  diagonal  entry  of  M  can  be  calculated  explicitly,  and  is - ^ - which  is  less  than  1. 

1  +  hm\au\ 

To  prove  the  theorem  for  the  case  where  A  is  diagonally  dominant  and  has  negative  diagonal  entries, 
we  use  the  fact  that  the  spectral  radius  is  bounded  by  any  induced  norm.  In  particular, 

n 

p(M)  <  II M  8.  —  max,  I  |  m,.  |  ,  which  can  be  calculated  from 
y-i 


n 


2 I  mu  I 

7-1 


2  K-I 


7-1.7*' 


h  +  *  a'a  * 

m 


and  is  less  than  1  by  the  diagonal  dominance  property  of  A.  Therefore,  the  eigenvalues  of  M  are  less 
than  one.  ■ 

Although  the  stability  of  the  Jacobi-semi-implicit  integration  method  is  substantially  better 
than  the  explicit-Euler  algorithm  used  in  Chapter  2,  particularly  for  almost  diagonal  problems,  the 
domains  of  dependence  are  identical.  This  can  be  seen  by  comparing  Eqn.  (3.19)  to  Eqn.  (2.16).  It 
is  possible  to  construct  semi-implicit  integration  methods  that  have  larger  domains  of  dependence 
than  the  Jacobi-semi-implicit  integration  method  without  requiring  a  matrix  solution.  In  particular, 
there  is  the  Seidel-semi-implicit  method, 

x(rm)  -  x(rnl_j)  +  hm[  Dx(rm)  +  (L  +  U)x{rm_i)  ].  [3.23] 


Solving  for  x(rm)  leads  to 
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where  (7  -  hm(L  +  £>))-'  is  easy  to  compute  because  the  matrix  is  triangular.  The  Seidel-semi- 
implicit  method  has  stability  properties  that  are  similar  to  the  Jacobi-semi-implicit  method. 

Theorem  3.6:  If  the  matrix  A  in  Eqn.  (3.18)  is  diagonally  dominant,  with  negative  diagonal  entries, 
or  if  A  is  lower  or  upper  triangular,  then  the  region  of  stability  for  the  Seidel-semi-implicit  method  is 
the  open  left-half  plane  of  £.■ 

For  the  case  of  A  diagonally  dominant  with  negative  diagonals,  Theorem  3.6  follows  from  arguments 
similar  to  those  used  to  prove  Theorem  3.5.  If  A  is  lower  triangular,  the  Seidel-semi-implicit  algorithm 
is  identical  to  impIicit-Euler  which  is  A-stable,  and  if  A  is  upper  triangular  the  algorithm  is  identical 
to  the  Jacobi-semi-implicit  algorithm. 

The  Seidel-semi-implicit  method  does  not  have  obviously  better  stability  properties  than  the 
Jacobi-semi-implicit  method,  but  it  has  the  clear  advantage  of  a  larger  domain  of  dependence.  To  see 
this,  consider  the  expansion  of  (7  —  hm(L  +  Z)))'1  in  Eqn.  (3.24)  for  small  hm, 

x{rj  -  [/  +  hJL  +  D)  +  A2(L  +  Z»2  +  /»3(L  +  Z»3  +  ...][/  +  hmU\x(rm_y).  [3.25] 

If  A  is  lower  triangular,  the  domain  of  dependence  of  the  Seidel-semi-implicit  method  is  exhaustive. 
As  long  as  the  lower  triangular  portion  of  A  is  nonzero,  the  domain  of  dependence  of  the  Seidel- 
semi-implicit  method  will  be  larger  than  that  of  the  Jacobi-semi-implicit  method. 

The  Seidel-semi-implicit  method  includes  the  domain  of  dependence  due  to  arbitrarily  high 
powers  of  the  lower  triangular  portion  of  A.  The  next  semi-implicit  method  we  will  consider,  the 
symmetric  displacement  algorilhm[54,6],  also  includes  the  domain  of  dependence  due  to  arbitrarily 
high  powers  of  the  upper-triangular  portion  of  A.  Applied  to  Eqn.  (3-18),  the  symmetric  displace¬ 
ment  algorithm  is  the  following  two  step  process, 


x{rm+l/2)  -  x(tJ  +  0.25Am[(2L  +  7))x(T„1+1/2)  +  {D  +  2U)x(rm_i)  ].  [3.26 a] 
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*('„,)  -  Jr(TjH+1/2)  +  0.25AJ  (2U  +  D)x(t„i+1/2)  +  (D  +  27-MtJ  ].  [3.26A] 

Note  that  if  A  is  diagonal,  the  symmetric  displacement  algorithm  is  precisely  the  trapezoidal  rule. 

The  symmetric  displacement  algorithm  has  several  important  properties.  The  local  truncation 
error  is  of  order  A3,  unlike  the  other  semi-implicit  methods,  whose  error  is  of  order  A2[6].  In  addition, 
it  has  the  stability  properties  given  in  the  following  theorem. 

Theorem  3.7:  If  the  matrix^  in  Eqn.  (3.18)  is  strictly  diagonally  dominant,  with  negative  diagonal 
entries,  or  if  A  is  symmetric,  lower  triangular  or  upper  triangular,  then  the  region  of  stability  for  the 
symmetric-displacement  method  is  the  open  left-half  plane  of  £-■ 

The  proof  of  Theorem  3.7  for  the  case  where  A  strictly  diagonally  dominant  with  negative  diagonal 
terms  follows  from  the  same  reasoning  as  used  in  the  proof  of  Theorem  3.5.  The  proof  for  case  of 
A  symmetric  can  be  found  in  [6]. 

As  indicated  by  Theorem  3.7,  the  stability  properties  of  the  symmetric  displacement  algorithm 
are  better  for  near  symmetric  problems  than  those  of  the  Seidel-semi-implicit  method,  but  symmetric 
displacement  has  a  smaller  region  of  stability  if  the  problem  is  almost  lower  triangular.  The  symmetric 
displacement  algorithm  is  superior  to  the  Seidel-Semi-implicit  method  in  two  important  aspects,  its 
local  truncation  is  of  a  higher  order,  and  it  has  a  larger  domain  of  dependence  for  problems  that  are 
not  lower  triangular.  To  show  this,  Eqn.  (3.26a)  and  Eqn.  (3.26b)  are  reorganized  as 

x(rm)  -  [/  -  0.25 hm(D  +  21)]' 1  [7  +  0.25A„,(£>  +  2  U)]  [3-27] 

(7  -  0.25A„,(D  +  2t/)]'1  [7  +  0.25Am(7>  +  2L)]x(rJ. 

The  expansion  of  [7  —  0.25 hm(D  +  2 L)]~l  will  include  all  the  powers  of  L,  and  the  expansion  of 
[7-  0.25AW(Z?  +  2 IT)]-1  will  include  all  the  powers  of  U.  Note  that  this  does  not  mean  that  the 
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symmetric  displacement  algorithm  has  an  exhaustive  domain  of  dependence.  For  example,  the  do* 
main  of  dependence  of  A2  is  not  necessarily  the  same  as  L2  +  U2,  there  are  possibly  addition  de¬ 
pendencies  due  to  the  cross-product  terms  LU  and  UL. 

None  of  the  semi-implicit  methods  mentioned  above  match  the  stiffly-stable  implict  muftistep 
method  for  either  stability  or  domain  of  dependence.  However,  they  have  proved  to  be  extremely 
useful  for  a  variety  of  circuit  simulation  applications  where  the  either  the  problem  is  not  that  stiff,  or 
is  of  a  mostly  diagonal  or  lower  triangular  form.  For  this  reason,  extensions  of  the  semi-implicit 
methods  mentioned  above  to  the  case  where  C(x,u)  is  not  diagonal  have  been  pursued[55,6].  Similar 
results  about  region  of  stability  for  these  extensions  have  been  shown. 

SECTION  3.4  -  RELAXATION  VS  SEMI-IMPLICIT  INTEGRATION 

The  relaxation-Newton  algorithms  described  in  Section  2  present  a  bound  on  the  numerical 
integration  timestep  to  insure  that  the  relaxation  converges.  This  bound  is  similar  to  the  bound  on  the 
timestep  to  insure  stability  of  the  semi-implicit  numerical  integration  methods.  In  order  to  demon¬ 
strate  briefly  the  similarities  of  the  two  approaches,  we  wil  end  this  chapter  by  comparing  the  the 
simpliest  of  each  type  of  method,  the  Jacobi-relaxation  algorithm  applied  to  solving  the  implicit-Euler 
equation,  and  the  Jacobi-semi-implicit  algorithm.  Again,  to  keep  the  analyses  simple,  we  will  use  the 
test  problem  of  Eqn.  (3.18) 

The  timepoint  update  equation  for  the  Jacobi-semi-impiicit  algorithm  is 

*(Tm)  -  (/  -  hmD)~'[  I  +  hJL  +  U)  Jr(rm_,).  [3.27] 

The  iteration  update  equation  of  the  Jacobi  relaxation  applied  to  implicit-Euler  is 

x*+,(rm)  -  xk(rm)  -  (7  -  hntD)~i[hm(L  +  U )]  [xk{rm)  -  xk~\rjl  [3.28] 


The  semi-implicit  method  will  be  stable  if 
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p[  (/  -  hmDr\l  +  hJL  +  U))  ]  <  1 
and  the  relaxation  will  converge  if 

pl(I-hn,D)-\hJL+  U))]  <  1. 

Both  spectral  radii  will  be  less  than  1  for  any  hm  if  A  is  diagonally  dominant  and  has  negative  diagonal 
elements.  If  A  is  not  diagonally  dominant,  but  has  negative  diagonal  elements,  the  method  that  will 
allow  the  larger  timestep  will  depend  on  the  signs  and  magnitudes  of  the  lower  and  upper  triangular 
portions  of  A. 

Although  the  size  of  the  largest  allowable  timestep  does  not  conclusively  favor  semi-implicit 
integration  methods  or  relaxation  methods,  relaxation  methods  are  clearly  superior  with  respect  to  the 
relative  domains  of  dependence.  By  carrying  the  relaxation  iteration  to  convergence,  it  is  assured  that 
the  information  at  a  given  timestep  has  propagated  "far  enough".  Therefore,  relaxation  methods 
have  the  exhaustive  domain  of  dependence  property,  and,  as  described  above,  the  semi-implicit 


methods  do  not 
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CHAPTER  4  -  THE  WAVEFORM  RELAXATION  ALGORITHM 

The  multistep  numerical  integration  algorithms  for  solving  ODE  systems  can  become  ineffi¬ 
cient  for  large  systems  where  different  state  variables  are  changing  at  very  different  rates.  This  is 
because  the  direct  application  of  the  integration  method  forces  every  differential  equation  in  the 
system  to  be  discretized  identically,  and  this  discretization  must  be  fine  enough  so  that  the  fastest 
changing  state  variable  in  the  system  is  accurately  represented.  If  it  were  possible  to  pick  different 
discretization  points,  or  timesteps,  for  each  differential  equation  in  the  system  so  that  each  could  use 
the  largest  timestep  that  would  accurately  reflect  the  behavior  of  its  associated  state  variable,  then  the 
efficiency  of  the  simulation  would  be  greatly  improved.  This  is  refered  to  as  the  multirate  problem[l], 
and  numerical  integration  methods  that  allow  for  different  state  variables  to  use  different  timesteps 
are  called  multirate  integration  methods. 

The  selective  trace  technique  for  improving  the  efficiency  of  relaxation-Newton  methods 
(Section  3.2)  can  be  thought  of  as  a  limited  multirate  integration  method.  If,  at  a  given  timestep,  the 
x,  variable  is  at  its  equilibrium  (or  stationary)  point,  and  the  variables  on  which  x,  depend  do  not 
change,  then  xt  will  retain  the  value  it  had  before  the  timestep.  In  fact,  x,  will  never  be  recomputed 
until  some  Xj  on  which  it  depends  changes.  If  x,  is  bypassed  for  several  timesteps  the  effect  is  the  same 
as  if  a  large  timestep  were  used  to  compute  x,.  Therefore,  selective  trace  algorithm  exploits  the  kind 
of  multirate  behavior  that  stems  from  as  system  in  which  most  of  of  the  variables  remain  at  an  equi¬ 
librium  state.  The  selective  trace  algorithm  can  not,  however,  exploit  of  a  system  for  which  the  state 
variables  have  different  rates  of  motion,  but  are  not  at  equilibrium. 

Techniques  based  on  semi-implicit  integration  algorithms  have  been  used  both  to  achieve  the 
kind  of  limited  multirate  integration  described  above,  and  to  achieve  full  multi-rate  integration 
methods[4,57].  However,  as  pointed  out  in  Section  3.3,  the  semi-implicit  integration  algorithms  do 
not  have  all  of  the  properties  that  make  a  numerical  method  for  circuit  simulation  robust.  A  different 
approach  is  to  somehow  decompose  the  differential  equations  before  introducing  discrete  approxi- 
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matrons.  If  the  differential  equations  are  solved  independently,  the  numerical  integration  method 
used  for  each  system  can  pick  its  own  timestep,  thereby  achieving  full  multi-rate  integration.  In  ad¬ 
dition,  since  any  numerical  integration  algorithm  can  be  used  to  solve  the  decomposed  systems,  one 
that  retains  all  the  desirable  numerical  properties  described  in  Chapter  2  can  be  used. 

One  method  for  decomposing  differential  equations  is  the  family  of  Waveform  Relaxation  al¬ 
gorithms  [11].  WR  algorithms  have  captured  considerable  attention  due  to  their  favorable  numerical 
properties  and  to  the  success  in  applying  the  WR  algorithms  to  the  solution  of  Metal-Oxide- 
Semiconductor  (MOS)  digital  circuits.  In  this  chapter  the  theoretical  basis  for  the  WR  algorithm  will 
be  presented.  Waveform  relaxation  will  be  introduced  with  a  simple  example,  which  will  be  followed 
by  the  general  algorithm  applied  to  systems  of  the  form  of  Eqn.  (2.2).  Then  a  new  proof  of  the 
convergence,  one  that  demonstrates  that  the  WR  algorithm  is  a  contraction  mapping  in  a  particular 
norm,  will  be  presented.  Extensions  to  the  basic  algorithm  that  allow  for  modified  iteration  equations 
(including  discrete  approximations)  will  be  presented  and  it  will  be  shown  that  the  convergence  of 
such  extensions  follows  directly  from  the  proof  that  the  WR  algorithm  is  a  contraction  mapping.  We 
will  end  this  chapter  by  presenting  a  derivative  of  the  WR  algorithm,  the  waveform 
relaxation-Newton(WRN)  algorithm,  which  is  the  extension  to  nonlinear  differential  equations  of  the 
relaxation-Newton  algorithm  presented  in  Section  3.2. 

SECTION  4.1  -  THE  BASIC  WR  ALGORITHM 

We  will  start  this  section  with  a  simple  illustrative  example,  and  then  present  the  general  WR 
algorithm.  Consider  the  first-order  two-dimensional  differential  equation  in:  x(f)  e  IR2  on 
t  e  [0,7]. 


/](■*],  *2 .0 

*l(0)  -  *]0 

[4.1a] 

f2(xvx2,t) 

-*2(0)  “  x20 

[4.1  bl 
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The  basic  idea  of  the  waveform-relaxation  algorithm  is  to  fix  the  waveform  x2:  [0,71  -*  R  and  solve 
Eqn.  (4.1a)  as  a  one  dimensional  differential  equation  in  Xj (t).  The  solution  thus  obtained  for  Xj(/) 
can  be  substituted  into  Eqn.  (4.1b)  which  will  then  reduce  to  another  first-order  differential  equation 
in  one  variable,  x2(t).  Eqn.  (4.1a)  is  then  re-solved  using  the  new  solution  forx2(f)  and  the  procedure 
is  repeated. 

Alternately,  fix  the  waveform  x2(t)  in  Eqn.  (4.1a)  and  fix  x,(r)  in  Eqn.  (4.1b)  and  solve  both 
one  dimensional  differential  equations  simultaneously.  Use  the  solution  obtained  for  x2  in  Eqn. 
(4.1b)  and  the  solution  obtained  for  x,  in  Eqn.  (4.1a)  and  re-solve  both  equations. 

In  this  fashion,  iterative  algorithms  have  been  constructed.  Either  replaces  the  problem  of 
solving  a  differentia]  equation  in  two  variables  by  one  of  solving  a  sequence  of  differential  equations 
in  one  variable.  As  described  above,  these  two  waveform  relaxation  algorithms  can  been  seen  as  the 
analogues  of  the  Gauss-Seidel  and  the  Gauss-Jacobi  techniques  for  solving  nonlinear  algebraic 
equations.  Here,  however,  the  unknowns  are  waveforms  (elements  of  a  function  space),  rather  than 
teal  variables.  In  this  sense,  the  algorithms  are  techniques  for  time-domain  decoupling  of  differential 
equations. 

The  WR  algorithm  for  solving  systems  of  the  form  of  Eqn.  (2.2): 

Algorithm  4.1  (WR  Gauss-Seidel  Algorithm  for  solving  Eon.  (2.2)) 

The  superscript  k  denotes  the  iteration  count ,  the  subscript  i  denotes  the  component  index  of  a 
vector  and  e  is  a  small  positive  number, 
k  -  0 

Guess  waveform  x°(/) ;  t  e  [0,7]  such  that  x°(0)  -  x0 

for  example,  set  x°(r)  -  Xo,  t «  [0,7*1 )'» 

repeat  { 

k~k  +  1 

foreach  ( i  e  {  1 }  )  { 
solve 

^2  C/y(xf, ...,  x* ,  xf+1*, ...,  xf  *,  u)x*  + 
i ,  c,  (xf . xf,  xfri1, ...,  X*-1,  U)X*->  - 

>i+l 

/(xf, ....  xf,  xf-,1, ....  xf-'.u)  -  0 
for  (  xf(0  ;  t  e  [0,7]  ),  with  the  initial  condition  x*(0)  -  x^. 
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} 

}  until  (  ||  xk  —  X*-1 1|  <  e  ) 

that  is,  until  the  iteration  converges. 

■ 

Note  that  the  differential  equation  in  Algorithm  4.1  has  only  one  unknown  variable  xf.  The  variables 
xfo’t ....  xjt~l  are  known  from  the  previous  iteration  and  the  variables  xf, ....  xf_}  have  already  been 
computed.  Also,  the  Gauss-Jacobi  version  of  the  WR  Algorithm  for  Eqn.  (2.2)  can  be  obtained  from 
Algorithm  4. 1  by  replacing  the  foreach  statement  with  the  forall  statement  and  adjusting  the  iteration 
indices. 

SECTION  4.2  -  CONVERGENCE  PROOF  FOR  THE  BASIC  WR  ALGORITHM 

If  the  matrix  C(x,u)  is  diagonally  dominant  and  Lipschitz  continous  with  respect  to  x  for  all  u 
then  both  the  Gauss-Seidel  and  the  Gauss-lacobi  versions  of  Algorithm  4. 1  are  guaranteed  to  con¬ 
verge.  In  [12],  it  was  shown  that  the  WR  algorithm  converges  when  applied  to  Eqn.  (2.2)  if  C(x,u) 
is  diagonally  dominant  and  independent  of  x.  As  many  systems  that  are  modelled  in  the  form  of  Eqn. 
(2.2)  include  a  dependence  of  C  on  x,  we  will  present  a  more  general  convergence  proof  that  extends 
the  original  theorem  to  include  these  systems.  In  addition,  we  will  prove  the  WR  algorithm  is  a  con¬ 
traction  in  a  simpler  norm  than  the  one  used  in  the  original  theorem. 

We  will  prove  the  theorem  by  first  showing  that  if  C(x,v)  is  diagonally  dominant,  then  there 
exists  a  bound  on  the  x*’s  generated  by  the  WR  algorithm  that  is  independent  of  k.  Using  this  bound, 
we  will  show  that  the  assumption  that  C(x,u)  is  Lipschitz  continuous  implies  there  exists  a  norm  on 
1R"  such  that  for  arbitrary  positive  integers  j  and  k, 

|x*+V)  -i/+,(/)||  <  r|i*(0-i'(j)|  +  /, ||x*+1(/)  -x>+1(/)||  +  l2'&xk(t)-xiV)\\ 

for  some  y  <  1  and  lt,  l2  <  <*  for  all  t  e  [0,73-  In  the  properly  chosen  norm  I  •  ||  b  on  C([0,7],  1R”) 
the  above  equation  implies  that 


-x/+1||6  < 
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where  y  <  1  and  therefore  the  sequence  fx*}  converges  by  the  contraction  mapping  theorem.  As 
jr*(0)  -  x0  for  all  k,  {or*}  converges  as  well. 

Before  formally  proving  this  basic  WR  convergence  theorem  we  will  state  the  well-known 
contraction  mapping  theorem[35],  and  a  few  lemmas  which  will  be  used  in  the  course  of  the  proof. 

The  Contraction  Mapping  Theorem:  Let  Y  be  a  Banach  space  and  F:  Y  ■*  Y.  If  F  is  such  that 
HjFOO  -  F(x)  U  <  y  By  -  *  H  for  all  xy  e  Y,  for  some  y  e  [0,1),  then  F  has  a  unique  fixed  point  y 
such  that  F(y )  -  y .  Furthermore,  for  any  initial  guess  y°  e  Y  the  sequence  {/  e  Y]  generated  by 
the  fixed  point  algorithm  y*  —  /■(y*-1)  converges  uniformly  t oy. 

Lemma  4.1:  If  C(jc,w)  e  1R"“  is  diagonally  dominant  uniformly  over  all  x  e  JR",  u  e  JR'  then  given  any 

collection  of  vectors  {x1 . x"},  x‘  e  1R"  ,  and  any  u  e  JR',  the  matrix  OCx1, x",  u )  e  1R"J"  defined 

by  Cf/x1, ...,  x",  u)  m  CtJ(x\  u)  is  also  diagonally  dominant.  In  other  words,  let  O  be  the  matrix 
constructed  by  setting  the  ?*  row  of  O’  equal  to  the  rA  row  of  the  given  matrix  C(x*,  u).  Then  this  new 
matrix  is  also  diagonally  dominant.  ■ 

Lemma  4. 1  follows  directly  from  the  definition  of  diagonal  dominance. 

Lemma  4.2:  Let  C  e  R**"  be  any  strictly  diagonally  dominant  matrix.  Let  L  strictly  lower  triangular, 
U  strictly  upper  triangular,  and  D  diagonal,  be  such  that  C  —  L  +  D  +  U.  Then 
I D~\L  +  l/)|.  <  1  and  |(2>  +  L)-‘£f|«  < 

Lemma  4.2  is  a  standard  result  in  matrix  theory[28]. 

Lemma  4.3:  Let  x,y  e  C([0,7],  R").  If  there  exists  some  norm  on  R"  such  that 


<  y  lb'</>  II  +  /iMOl  +  h  IWO I 


[4.2] 
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for  some  positive  numbers  /,,  A  <  «  and  y  <  1  then  there  exists  a  norm  |  %  |  b  on  C([0,7"],  IR") 
such  that 

1*1*  <  « +  /,W0)||  +  /2b<0)g  [4.3] 

for  some  positive  number  a  <  l.B  Proof  of  Lemma  4.3: 

Substituting  J£c(t)«/t  +  x(0)  for  x(t)  in  Eqn.  (4.2)  and  performing  an  analogous  Substitution 
for XO,  multiplying  the  entire  equation  by  e'61,  and  moving  the  norms  inside  the  integral  yields: 

e-^um  <  ye-^imi  +  +  Vi'llx(0)||  +  [4.4] 

J  0 


+  l2e~b,lAO)h 
J  o 

Let  i  •  be  defined  by  ||/U  *  m  max(0>r7e~!"  U .  This  is  a  norm  on  C([0,7],  IR")  for  any  finite 
positive  number  b  >  0  and  is  equivalent  to  the  uniform  norm  on  C([0,7],  IR").  Then  Eqn.  (4.4)  im¬ 
plies 


Uh  <  yiyh  +  max[on[ JVVt  \\x\\b  +  ll-r(O)  R  + 

l2e-b,f’ebTdT  h'h  '+  l2e~b>\\y(0)\\  ] 
J  o 


And  since  e-*/'  e^dr  <  -i- ,  then  for  b>  lx  we  can  write 


11*1*  < 


y  +  l2b  1 

1  -  /,A_l 


Oi’iu  +  /,iwo)i  +  /2ii>’(o)n. 


[4.5] 
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In  this  case  y  is  less  than  1,  so  there  exists  a  finite  B  for  which 
Eqn.  (4.5)  be  set  equal  to  this  B  to  get 


(Y  +  h B-') 

1  -  /,£-> 


a  <  1 .  Let  the  b  in 


lil*  <  atljMU  +  /i«Jf(0)|  +  h  1X0) I 


[4.6] 


which  completes  the  proof.  ■ 

Now  we  prove  the  following  WR  convergence  theorem  for  systems  of  equations  of  the  form 
of  Eqn  (2.2). 

Theorem  4.1:  If,  in  addition  to  the  assumptions  of  Eqn.  (2.2),  C(x(/)M0)  e  1R"”  is  strictly 
diagonally  dominant  uniformly  over  all  x(f)  e  R"  and  u(t)  e  Rr  and  Lipschitz  continuous  with  respect 
to  x(t)  for  all  u(r),  and  x°(t)  is  differentiable,  then  the  sequence  of  waveforms  {x*}  generated  by  the 
Gauss-Seidel  or  Gauss-Jacobi  WR  algorithm  will  converge  uniformly  to  the  solution  of  Eqn.  (2.2)  for 
all  bounded  intervals  [0,7].B 

Proof  of  Theorem  4.1: 

We  will  present  the  proof  only  for  the  Gauss-Seidel  WR  algorithm,  as  the  proof  for  the 
Gauss-Jacobi  case  is  almost  identical.  The  equations  for  one  iteration  of  the  Gauss-Seidel  WR  algo¬ 
rithm  applied  to  Eqn.  (2.2)  can  be  written  in  matrix  form  as 


C(x*+1,  xk,u)xk+ 5  -  /(x*+1,  xk,u ) 


where 


C,,(x*+1,  x*.  u)  -  C0(. xf+l,  ....  x*+1,  *f+1,  ....  xi,  u) 


and 


f,(xk*\  x\u)  -  /Xxf+1,  ....  xf+1,  xf+I,  ...,  x*,u).  Let  C(x*+1,  x*,  u)  -  I*+,  +  DM  -  C4+, 
where  Z*+,  is  strictly  lower  triangular,  UM  is  upper. triangular,  and  D*+1  is  diagonal  (Note  that  by 

A 

Lemma  4. 1,  the  matrix  C  is  diagonally  dominant  because  C  is  diagonally  dominant).  Rearranging  the 
iteration  equation  yields: 


x*  +  1 


U*+ 1  +  ^+i)_1  [  Uk+lxk  +  /(x*+I,  x*,  u)  l 


[4.7] 
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Taking  the  difference  between  Eqn.  (4.7)  at  iteration  k  +  1  and  at  iteration  j  +  1  yields 

_*/»  _  (Lk+1+Dk+l)~'uk+lik  -  {Lj+1  +Dj+xrXUj^  +  [4.8] 

(4+1  +  Dk+1)~lf  (xk+1,  x k,u)  -  (Lj+1  +Dj+lr1f(x'+\  x>,u) 

A 

Using  the  Lipschitz  continuity  of  /  and  that  fl  (i.i+,  +  Dk +,)-1 1|  <  K  for  some  K  <  «  independent  of 
x  and  k  (because  C(x,u)  is  uniformly  diagonally  dominant  with  respect  to  x)  in  Eqn.  (4.8)  leads  to 

|i*+1(0-i/+,(/)l  <  /iJC|jc*+,(0-*/+,WI  +  l2K\\xk(t)  -  xJ{()  ||  +  [4.9] 

|  (I*+1  +  -  (LJ+i  +  Dj+1rl  ||  I/(^+1,  x>,  u)  ||  + 

K4+1  +  Dk+l)~*Uk+lxk(t)  -  {Lj^+Dj^r'Uj+JV)  II 

A 

where  /,  is  the  Lipschitz  constant  of  /  with  respect  to  its  first  argument,  and  l2  is  the  Lipschitz  con- 

A 

stant  of  /  with  respect  to  its  second  argument.  That  C(x,u)  is  uniformly  diagonally  dominant  and 
Lipschitz  continuous  with  respect  to  x  for  all  u  implies  (L*  +  jO*)-1  and  (Lk  +  Dk)-lUk  are  also 
Lipschitz  continuous  in  the  same  manner.  It  then  follows  that  there  exist  some  positive  finite  numbers 
k, ,  k2,ky,kA  such  that 

|x*+I(0  —  x'+V)  II  -  /,JC|x*+,W-*/+,(/)|  +  liKUk(t)-xJ(t)\\  +  [4.10] 

[k3Ix*+V)-*/+V)ll  +  MAo-AoH]  ||/(*/+1,  y',u)||  + 

[kj  I  xk+\t)  -  x>+\t)  ||  +  *2  bAo  -  AO  |]  |i*w  II  +  y  l**«  -  At)  II 

where  k,  is  the  Lipschitz  constant  of  (Lk  +  Z)t)_1  Uk  with  respect  to  its  first  x  argument  (see  definition 
of  L, Uk  and  Dk  above),  k2  is  the  Lipschitz  constant  with  respect  to  the  second  x  argument.  k3  and 
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k4  are  the  Lipschitz  constants  for  (Lk  +  Dk)~'  with  respect  to  its  first  and  second  x  arguments,  and  y 
is  such  that  ||  (Lk  +  Z)*)-1  Uk  ||  <  y  <  1  independent  of  k  (by  Lemma  4.2). 

A 

To  establish  a  bound  on  the  terms  in  Eqn.  (4.10)  involving  || **(/)  8  and  8/ (x'+l,  x j,  u)  8  it  is 

A 

necessary  to  show  that  the  x*'s  and  therefore  the  x*’s  and  /  (  •  )’s  are  bounded  a  priori.  We  prove 
such  a  bound  exists  in  the  following  lemma. 

Lemma  4.4:  If  C(x,u)  in  Eqn.  (2.2)  is  strictly  diagonally  dominant  and  Lipschitz  continuous  thin  the 
x*(t)’s  produced  by  Algorithm  4.1  are  bounded  independent  of  kM 
Proof  of  Lemma  4.4 

If  8  •  J  is  the /.norm  on  1R",  by  Lemma  4.1  8  (Z.*+)  +  jD*+i)-1  Uk+1 8  <  1.  From  Eqn.  (4.7), 

Ji*+1(08  <y«i*(0I  +  8(L*+,  +2>*+ir,||  l/(**+1(0.  t4.11] 

for  some  positive  number  y  <  1.  As  fix,  u)  is  globally  Lipschitz  continuous  with  respect  to  x,  there 
exist  finite  positive  constants  lu  l2  such  that 

I /(*,*«)  -  /(«**,«)  I  <  /,|jc-w|  +  /Jy-rfl  [4.12] 

for  all  u, x,y,  w,z  e  1R".  From  Eqn.  (4.11)  and  Eqn.  (4.12)  and  using  the  fact  that 
8U*+.  +  Z)*+I)-«  8  is  bounded  by  some  K  <  «  for  all  k: 

B**+,(0i  <  y|i*(0l  +/i^|x*+,(f)|  +  /2 A" | x*(/)|  +  K »/ (0,0,«)  ||  [4.13] 

Eqn.  (4.13)  is  in  the  form  to  apply  a  slightly  modified  Lemma  4.3.  Therefore  there  exists  some 
1  %  8  b  such  that 


»**+1IU  <  «IU*fl6  +  (/jAT  +  l2K)  |x(0)  8  +  *1/(0,  0,  u)  8 


[4.14] 
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where  a  <  1.  This  implies  that 

lxk+1h  <  T^4(/1K+/2K)Hx(0)ll  +  *1/(0.  0.  «)|]  +  (a)*l|i0Ui  [4.15] 
for  all  k.  Then,  since  J  at0  1 4  is  bounded  by  assumption,  and  ||x*+1  |  b  ■»  max(0  7-1e-6'  H  jr**"(r)  || , 
IU*+1(0ll  <  ebT[  — L-K/j*  +  /2*)|U(0)  ||  +  K\f  (0,  0,  u)  |  ]  +  ||jc°||6]  -  M  [4.16] 

1  "  fl! 

which  proves  the  lemma.  ■ 

In  Lemma  4.4  it  was  proved  that  |  x*(t)  H  is  bounded  a  priori  by  some  M.  This  implies  x*(f)  is 

a  ~ 

bounded  on  [0,71-  Using  the  Lipschitz  continuity  property  of  /,  a  bound,  N,  can  be  derived  for 

A 

1/  x*(f),  u)  ||  .  Applying  these  bounds  to  Eqn.  (4.10)  we  get 

|x*+1(f)  -x/+,(0B  <  rlAO-^WI  +  [4-17] 

(/j*  +  +  k^N)  |x*+1(r)  -  xi+\t)  ||  +  {l2K  +  Mk2  +  kAN )  ||x*(r)  -  x>{t)  || 

where  y  <  1.  Eqn.  (4.17)  is  of  the  form  to  apply  Lemma  4.3.  As  x*+,(0)  —  *y+,(0)  -  0  for  all 
k  J,  Lemma  4.3  implies 


i/+,-^+,ii6  <  «iu*-^n6  [4.i8] 

for  some  norm  on  C([0,7],  1R")  and  for  some  a  <  1.  As  C([0,7],  IR")  is  complete  in  any  one  of  the 
5 -norms,  by  the  contraction  mapping  theorem  x*  converges  to  some  x  e  C([0,7],  IR")  which  is  a  fixed 
point  of  Eqn.  (4.7).  Any  fixed  point  x  of  Eqn.  (4.7)  is  a  solution  to  Eqn.  (2.2)  if  x(0)  -  x„, 
x*(0)  —  jc0  for  all  k,  therefore  x*  converges  to  the  unique  solution  of  Eqn.  (2.2).  The  sequence  {.r*} 
converges  because  integration  from  0  to  T,  which  maps  x{t)  to  x(t) ,  is  a  bounded  continuous  function. 
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SECTION  4.3  -  NONSTATIONARY  WR  ALGORITHMS 

Algorithm  4.1  is  stationary  in  the  sense  that  the  equations  that  define  the  iteration  process  do 
not  change  with  the  iterations.  A  straight-forward  generalization  is  to  allow  these  iteration  equations 
to  change,  and  to  consider  under  what  conditions  the  relaxation  still  converges  [13].  There  are  two 
major  reasons  for  studying  nonstationary  algorithms.  The  solution  of  the  ordinary  differential 
equations  in  the  inner  loop  of  Algorithm  4. 1  cannot  be  obtained  exactly.  Instead  numerical  methods 
compute  the  solution  with  some  error  which  is  in  genera]  controlled,  but  which  cannot  be  elimihated. 
However,  the  discrete  approximation  can  be  interpreted  as  the  exact  solution  to  a  perturbed  system. 
Since  the  approximation  changes  with  the  solutions,  the  perturbed  system  changes  with  each  iteration. 
Hence,  practical  implementations  of  WR  that  must  compute  the  solution  to  the  iteration  equations 
approximately  can  be  interpreted  as  nonstationary  methods. 

The  second  reason  for  studying  nonstationary  methods  is  that  they  can  be  used  to  improve  the 
computational  efficiency  of  the  basic  WR  algorithm.  An  approach  would  be  to  improve  the  accuracy 
of  the  computation  of  the  iteration  equations  as  the  relaxation  approaches  convergence.  In  this  way, 
accurate  solutions  to  the  original  system  would  still  be  obtained,  but  unnecessarily  accurate  compu¬ 
tation  of  the  early  iteration  waveforms,  which  are  usually  far  from  the  final  solution,  is  avoided. 

In  this  section  we  show  that  nonstationary  WR  algorithms  converge  as  a  direct  consequence 
of  the  contraction  mapping  property  of  the  original  WR  algorithm.  That  is,  given  mild  assumptions 
about  the  relationship  between  a  general  stationary  contraction  map  and  a  nonstationary  map,  the 
nonstationary  map  will  produce  a  sequence  that  will  converge  to  within  some  tolerance.  And  if  in  the 
limit  as  k  <*  the  nonstationary  map  approaches  the  stationary  map,  then  the  sequence  generated 
by  the  nonstationary  map  will  converge  to  the  fixed  point  of  the  original  map.  In  later  sections  we 
will  lean  on  these  results  to  guarantee  the  convergence  of  implementations  of  WR-based  algorithms. 
Theorem  4.2:  Let  Y be  a  Banach  space  and  F,  Fk:Y -»  Y.  Define >>*+1  «■  Fly*)  andj^*1  —  /'*(>’*)  • 
If  F  is  a  contraction  mapping  with  contraction  factor  y  (See  section  4.2),  ||  Fly)  —  F^ly)  ||  <  5*  for 
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all  y  e  Y,  and  z  e  Y,  is  such  that  z  -  F{z),  then  for  any  e  >  0  there  exists  a  5  <  1  such  that  if 

8k  <  8  for  all  k  then  lim*.J|/  -y*-1  ||  <  e  and  lim*.JU  -y*I  <  ~ —  -  Futhemiore,  if 

1  -  y 

■*  0  then  Iim*__  ||y*  -  y*-*  1|  -*  0  and  lim*,.[|z  —  yk  ||  -*•  O.M 
Proof  of  Theorem  4.2 

Taking  the  norm  of  the  difference  between  the  k,h  and  k  +  1"  iteration  of  the  nonstationary 
algorithm  we  get: 


l?+,-?ll  <  HF*+1(?)  -^(?-1)ll  [4.19] 

Given  that  fl  F*(y)  -  F(y)  |  <  5*  for  ally  e  Y 

<  UF(?)-FC?-1)||  +  8k  +  S*+I.  [4.20] 

Using  the  contraction  property  of  F, 

<  rll?-?-,|]  +  8k  +  Sk  +  l.  [4.21] 

Unfolding  the  iteration  equation  into  direct  sum  form. 


l?+1  -  y*IU  <  «*+1  +  6*  +  2r*"V  +  5/_1). 


[4.22] 


i=l 


If  5*  <  8  for  all  k  then  from  Eqn.  (4.22) 


-?l  S  25(1 

1  -  y 


[4.23] 


As  y  <  1 ,  lim*_„  ||y*+l  —  y*  j|  can  be  made  as  small  as  desired  by  reducing  8,  which  proves  the  first 
part  of  Theorem  4.2. 
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Let^  be  the  fixed  point  of  F.  The  difference  between  the  computed  and  the  exact  solution  at 
the  k  +  T*  iteration  is 

ll?+1-^ll  -  I Fk(pl)-F(y)\\.  [4-24] 

Again  using  the  contractive  property  of  F  and  that  I  F(y)  —  Fk(y)  ||  <  <5*  , 

-  yl?~y\\  +  5*.  [4.25] 

Summing  and  taking  the  limit, 

-ylb  <  [4.26] 

which  completes  the  proof  of  the  first  statement  of  Theorem  4.2.  The  second  statement  of  the  the¬ 
orem  follows  from  almost  identical  arguments.  ■ 

In  Section  4.2  we  proved  the  WR  iteration  was  a  contraction  mapping  in  the  appropriate  norm 
I  •  1  *  on  C([0,n.  1R")  where  B  depended  on  the  problem.  To  repeat  the  result  from  that  section, 
it  was  shown  that: 


where  a  <  1  This  WR  convergence  result  and  Theorem  4.2  imply  that  using  any  "reasonable"  ap¬ 
proximation  method  to  solve  the  WR  iteration  equations  will  still  converge,  provided  the  errors  in  the 
approximation  are  driven  to  zero.  In  addition,  Theorem  4.2  indicates  that  it  will  be  difficult  to  de¬ 
termine  a  priori  how  accurately  the  iteration  equations  must  be  solved  to  guarantee  convergence  to 
within  a  given  tolerance,  because  an  estimate  of  the  contraction  factor  of  the  WR  algorithm  is  re¬ 


quired. 


Page  65 


From  Theorem  4.1,  the  WR  is  a  contraction  mapping  with  respect  to  x(t)  in  a  B  norm,  Theorem 
4.2  then  implies  that  the  WR  iteration  equations  must  be  solved  accurately  with  respect  td  x(t)  in  this 
B  norm  if  the  iterations  are  to  converge.  There  is  a  more  cumbersome  proof  of  the  WR  convergence 
theorem  in  which  it  is  shown  that  the  WR  algorithm  is  a  contraction  in  x(t),  but  in  a  larger  B  norm 
than  the  one  used  in  the  proof  of  Theorem  4.1,  and  the  size  of  this  B  is  a  function  of  the  magnitude 
of  the  off-diagonal  terms  of  C(x,u).  With  such  a  result.  Theorem  4.2  implies  that  it  is  only  necessary 
to  control  errors  in  the  computation  of  x(t)  to  guarantee  iteration  convergence.  However,  conver¬ 
gence  in  a  larger  B  norm  is  in  some  sense  a  weaker  type  of  convergence.  So,  in  the  case  where 
C(x,u)  has  non-zero  off-diagonal  terms,  it  is  expected  that  more  rapid  convergence  would  be 
achieved  if  the  jc*(/)’s  are  computed  in  a  way  that  also  guarantees  that  the  jt*(/)’s  are  globally  accurate. 

SECTION  4.4  -  WAVEFORM  RELAXATION-NEWTON  METHODS 

The  WR  algorithm  is  an  extension  to  function  spaces  of  the  relaxation  methods  used  to  solve 
linear  and  nonlinear  systems.  It  is  also  possible  to  extend  the  Newton-Raphson  algorithm,  and  its 
function  space  extension  also  has  practical  applications.  In  particular,  it  is  possible  to  approximately 
solve  the  WR  iteration  equations  with  one  iteration  of  the  Waveform-Newton  algorithm,  and  this  is 
the  function  space  extension  of  the  relaxation-Newton  methods  described  in  Section  3.2.  In  this 
section  we  will  derive  the  function-space  Newton  method  applied  to  systems  of  the  form  of  Eqn.  (2.2) 
and  prove  that  the  method  has  global  convergence  properties.  We  will  then  apply  this  method  in 
conjunction  with  the  WR  algorithm  to  generate  the  Waveform-Relaxation-Newton  (WRN)  algo¬ 
rithm. 

In  order  to  derive  a  function-space  extension  to  the  Newton-Raphson  algorithm,  let  F(x)  (from 
Eqa  (2.2))  be  defined  by 


F{x)  -  C(x,  u)x  -  fix,  u)  -  0  x(0)  -  xq 


[4.27] 
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where  x:[0,T]  JR",  u:[0,T\  ■*  IR'  and  is  piecewise  continuous;  C:  R"xR'  -*  R“"  is  such  that 
C(x,  u)~l  exists  and  is  uniformly  bounded  with  respect  to  x,  u;  and  /:  !R"xlR'  -*  IR"  is  globally 
Lipschitz  continuous  with  respect  to  x  for  all  u.  Applying  the  Newton-Raphson  algorithm  to  find 
an  x  such  that  F(x)  —  0  given  some  initial  guess  x°  we  get 

x*+1  -  xk  -  Jp\xk)F(xk)  [4.28] 

where  JF(x)  is  the  Frechet  derivative  of  F{x)  with  respect  to  x.  Note  that  in  this  case  JF(x)  is  a 
matrix-valued  function  on  [0,  J],  That  is,  JFix)  is  a  matrix  of  waveforms. 

Using  the  definition  of  the  Frechet  derivative,  we  can  compute  Jfix) , 

““lAI-O  (l/IADmx  +  A)  -  F{x)  -  JfixXh )||  -  0.  [4.29] 

Evaluating  this  limit  for  the  Fix)  given  in  Eqn.  (4.27)  we  get 

Fix  +  h)  -  Fix)  -  C(x  +  h,  u)ix  +  h)  -  C(x,  u)x  -  fix  +  h,u)  +  fix,  u)  [4.30] 


and  approximating  to  order  |j2 


Fix  +  h)  -  Fix) 


dx,u)h  + 


dCix,u) 

dx 


■hi 


dfjx,u) 

dx 


■h  +  0(  ||A  !|2) 


[4.31] 


As  Eqn:  (4.29)  applies  only  in  the  limit  as  h  -*  0  ,  Eqn.  (4.31)  implies 


Mx)h 


Cix,u)h  + 


dCix,u)  . 
— ; hx 


9fjx,u) 

dx 


■h 


[4.32] 


Substituting  the  computed  derivative  into  Eqn.  (4.28)  and  rearranging  we  get 


C(x*,i/)x*+1  +  ^x^-x‘)i‘  - fixk,u)  +  -^^(x‘+1-x‘) 


dx 


dx 


[4.33] 


We  will  refer  to  Eqn.  (4.33)  as  the  Waveform-Newton(WN)  algorithm  for  solving  Eqn.  (2.2).  It  is, 
however,  just  the  function-space  extension  of  the  classical  Newton-Raphson  algorithm. 


Newton  algorithms  converge  quadratically  when  the  iterated  value  is  close  to  the  correct  sol¬ 
ution,  but  they  do  not,  in  general,  have  global  convergence  properties.  The  WN  algorithm,  along  with 

inheriting  the  locally  quadratic  convergence  properties  of  general  Newton  methods,  will  alsb  converge 

dC(x,u) 


globally,  given  mild  assumptions  on  the  behavior  of  ■ 


dx 


•  stated  in  the  following  theorem: 


dC(x,u) 

Theorem  4.3:  For  anv  svstem  of  the  form  of  Ean.  (2.2)  in  which - ; - is  Lipschitz  continuous 

ox 

with  respect  to  x  for  all  u  and  /  is  continuously  differentiable,  the  sequence  {x*}  generated  by  the 
WN  algorithm  converges  uniformly  to  the  solution  of  Eqn.  (2.2).  ■ 


Proof  of  Theorem  4.3 

For  this  proof  of  the  convergence  of  the  Waveform-Newton  method  we  will  assume  that 
C(x,u)  is  independent  of  x  and  u.  as  the  proof  for  the  general  case  is  much  more  involved,  and  does 
not  provide  much  further  insight  into  the  nature  of  the  convergence.  For  the  case  C(x,u)  -  C  Eqn. 
(4.33)  can  be  simplified  to 


i*+1  -  C-1x*  +  C_l/(x*,u)  +  C~!  df(*x’U\xk+l  -xk). 


[4.34] 


Taking  the  difference  between  Eqn.  (4.34)  at  iteration  k  +  1  and  the  exact  solution  and  substituting 
(x*+1  -  x)  +  (x  -  x*)  forx*+l  -  x*  yields 


£ 

x*+,-x  -  C~'\f{xk,u)-f{x,u )]  +  C~‘  [  — — ((x*+ 1  -  x)  +  (x-x*))].  [4.35] 
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As  C  has  a  bounded  inverse  by  the  assumptions  following  Eqn.  (2.2),  and  that  /  is  continuously 
differentiable  on  [0,71  and  Lipschitz  continuous,  is  bounded  by  some  constant  /,  .  With 

OX 

this  bound, 

ixk+1-xi  <  /,flx*-xfl  +  /j||x*+,-x||  +  /,|jf*-x|.  1)4.36] 

Lemma  4.3  can  be  applied  to  Eqn.  (4.36)  (with  y  -  0  ).  Therefore  there  exists  some  b  <  ojc  and 
a  <  1  such  that 

[4.37] 


Therefore  [x*]  converges  to  x,  the  fixed  point  of  Eqn.  (4.34).  Given  x*(0)  -  x0  for  all  k,  jx*}  con¬ 
verges  to  the  solution  of  Eqn.  (2.2)  on  any  bounded  interval.  ■ 

As  mentioned  in  the  introduction  it  is  possible  to  combine  the  Waveform-Newton  method  de¬ 
rived  above  with  the  WR  algorithm  to  construct  the  waveform  extension  of  the  relaxation-Newton 
algorithms  presented  in  Section  3.2[19].  The  WR  iteration  equations  are  solved  approximately  by 
performing  one  step  of  this  Newton  method  with  each  waveform  relaxation  iteration,  to  yield  the 


following  Waveform-Relaxation-Newton  algorithm  (WRN). 


Algorithm  4.2  -  fWRN  Gauss-Seidel  Algorithm  for  solving  Eqn.  (2.2)) 

The  superscript  k  denotes  the  iteration  count,  the  subscript  i  e  [1, ....  N]  denotes  the  compo¬ 
nent  index  of  a  vector  and  c  is  a  small  positive  number. 
k~-  0 ; 

guess  waveform  x°(t)  ;  t  e  [0,7]  such  that  x°(0)  -  x0 

(for  example,  set  x°(r)  -  x^  t  e  [0,71 ); 

repeat  { 

***  +  1 
for  all  (  i  in  N )  { 
solve 

i-i 

2  CtXxf, ....  x* ,,  xf_I, ....  x£-1,  u)xf  + 

yml  *  J 


dCa(x i* . xf_i,  xf~x, ...» x*-1,  u) 

dx, 


(xf-xf-')xf  + 


2  CJx f, .... x£„  x*-' . xk„-\  u)x*-*  - 

y-i+ 1  J  J 

f(xf, ....  X?_„  X*”1,  ....xj-1,  u)  - 
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— 


W  -  ^-')  -  0 


} 


for  (  jc*(/)  ;  t  e  [0,7] ),  with  the  initial  condition  xf  (0)  —  x^. 


}  until  (  Ijc*  —  x*||  <  e  ) 


Like  Algorithm  4.1,  each  equation  has  only  one  unknown  variable  xf,  but  in  this  case,  each  of  the 
nonlinear  equations  has  been  replaced  by  a  simpler  time-varying  linear  problem. 

Given  the  global  convergence  properties  of  both  the  original  WR  and  the  WN  algorithms,  it  is 
not  surprising  that  the  WRN  algorithm  has  global  convergence  properties.  We  will  state  the  conver¬ 
gence  theorem,  but  will  not  present  the  proof  because  it  quite  similar  to  the  proof  of  the  basic  WR 
and  WN  convergence  theorems. 

dC(x,u ) 

Theorem  4.4;  If,  in  addition  to  the  assumptions  of  Theorem  4.1, - - - is  Lipschitz  continuous 

OX 

with  respect  to  jt  for  all  v;  then  the  sequence  {jt*}  generated  by  the  Gauss-Seidel  or  Gauss-Jacobi 
WRN  algorithm  converges  to  the  solution  of  Eqn.  (2.2)  on  all  bounded  intervals  [0,7]. ■ 

The  linear  time-varying  systems  generated  by  the  WRN  algorithm  are  easier  to  solve  numer¬ 
ically  than  the  nonlinear  iteration  equations  of  the  basic  WR  algorithm.  For  example,  if  an  implicit 
multistep  integration  method  is  used  to  solve  such  a  system,  the  implicit  algebriac  equations  the 
multistep  method  generates  will  be  linear.  In  addition,  linear  time-varying  systems  can  be  solved  with 
a  variety  of  efficient  numerical  techniques  other  than  the  standard  discretization  methods,  such  as 
col!ocation[58]  and  spectral  methods[22]. 


CHAPTER  5  -  DISCRETIZED  WR  ALGORITHMS 
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To  compute  the  iteration  waveforms  for  the  WR  algorithm  it  is  usually  necessary  to  solve  sys¬ 
tems  of  nonlinear  ordinary  differential  equations.  If  multistep  integration  formulas  are  used  to  {solve 
for  the  iteration  waveforms,  the  differential  equations  that  describe  the  decomposed  systems  w|l  not 
be  solved  exactly.  Therefore,  the  convergence  theorem  presented  in  Section  4.2  does  not  guarantee 
the  convergence  of  this  discretized  WR  algorithm.  However,  the  discretized  WR  algorithm  is  aj  non¬ 
stationary  method,  and  the  theorems  presented  in  Section  4.3  apply,  and  guarantee  WR  convergence 
to  the  solution  of  the  given  system  of  ODE’s  when  the  global  discretization  error  is  driven  to  zero 
with  the  WR  iterations.  Reducing  the  error  with  the  iteration  is  also  a  reasonable  practical  approach 
to  insuring  the  convergence  of  the  WR  algorithm  under  discretizations.  Timesteps  for  numerical  in¬ 
tegration  methods  are  usually  chosen  based  on  insuring  that  esimates  of  the  local  truncation  error  are 
kept  below  some  supplied  criteria.  Reducing  this  criteria  as  relaxation  iterations  progress  will  insure 
that  the  WR  algorithm  will  converge. 

The  view  of  the  discretized  WR  algorithm  as  a  nonstationary  method,  although  simple  and 
practical,  lends  no  insight  into  why  the  discretized  WR  algorithm  may  not  converge  in  some  cases, 
and  therefore  provides  no  guidance  for  selecting  a  numerical  integration  method.  It  also  does  not 
allow  for  comparison  to  more  classical  integration  methods.  For  this  reason,  in  this  chapter  the 
interaction  between  WR  algorithms  and  multistep  integration  methods  will  be  considered  in  detail. 
In  the  first  section,  the  discretized  WR  algorithm  will  be  analyzed  assuming  that  every  differential 
equation  in  the  system  is  discretized  identically  (hereafter  referred  to  as  the  global-timestep  case).  A 
simple  example  will  be  presented  that  demonstrates  a  possible  breakdown  of  the  WR  method  under 
discretizations.  The  nonconvergence  will  be  investigated  by  comparing  the  global-timestep 
discretized  WR  algorithm  to  the  relaxation-Newton  methods  of  Section  3.2.  A  strong  comparison 
theorem  for  linear  systems  will  be  proved:  the  global  timesteps  required  to  insure  WR  convergence 
is  identical  to  the  timesteps  required  to  insure  convergence  of  the  relaxation  methods  presented  in 
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Section  3.2.  A  convergence  theorem  for  the  fixed  global-timestep  discretized  WR  algorithm  will  then 
be  presented.  In  the  second  section,  the  global-timestep  restriction  will  be  lifted,  and  a  theorem 
demonstrating  the  convergence  of  the  multi-rate  timestep  case  for  systems  in  normal  form  will  be 
presented. 

SECTION  5.1  -  THE  GLOBAL  TIMESTEP  CASE 

Consider  the  two-node  inverter  circuit  in  Fig.  5.1.  The  current  equations  at  each  node  can  be 
written  by  inspection,  and  are: 


Cx,+g,xj  +  g2(x i-x2)  -  0  [5.1] 

Cx2  +  g2(x2  -  xt)  +  imi(xi,x2)  +  ^(xj)  -  0 
x,(0)  -  x2(0)  -  0. 

In  order  to  generate  a  simple  linear  example,  4,i>  were  linearized  about  the  point  where  the  input 
and  output  voltages  were  equal  to  half  of  the  supply  voltage.  Time  is  normalized  to  seconds  to  obtain 
the  following  2x2  example: 

ij  -  -  Xj  +  0.1x2  [5.2] 

*2  “  “  ^*1  +  -  *2 
Xj(0)  -  x2(0)  -  0. 

Note  that  the  initial  conditions  given  for  the  above  example  identify  a  stable  equilibrium  point. 

The  Gauss-Seidel  WR  iteration  equations  for  the  linear  system  example  are: 


•*+i 

*] 


-xf+1  +  0.  lx* 


[5.3] 
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4+1  -  -\xi+l  -  4+‘ 

4+,(0)  -  4(0)  -  4+,(0)  -  4(0)  -  0. 

Applying  the  Impiicit-Euler  numerical  integration  method  with  a  fixed  timestep  h, 
(x(nh)  —  -J-{jr(nA)  —  x((n  -  1)A)] )  to  solve  the  decomposed  equations  yields  the  follbwing 

n 

recursion  equation  for  x{{n): 


4+I(«)  - 


1  +h 


4+1(«- 1)  - 


\h 


*l(0) 


(1  +hf  (l+/i)' 


+  o.ia2(i  +  a/-"4  (/)]• 


[5.4] 


For  example,  let  X  -  200,  h  —  0.5  and  as  an  initial  guess  use  xf(nA)  -  nh  ,  which  is  far  from 
the  exact  solution  xf(nA)  «■  0  .  The  computed  sequences  for  the  initial  guess  and  first,  second  and 
third  iterations  of  Eqn.  (5.4)  are  presented  in  Table  5.1. 


TABLE  5.1  -  IMPLICIT-EULER  COMPUTED  WR  ITERATIONS 

STEP 

TIME 

INITIAL 

ITER  #1 

ITER  #2 

ITER  #3 

0 

0 

0 

0 

0 

0.5 

0.5 

-1.111 

2.469 

-5.487 

1.0 

1.0 

-3.704 

152 

-32.92 

5 

5 

-7.778 

355 

-111.6 

2.0 

2.0 

-13.17 

66.21 

-281.3 

2.5 

2.5 

-19.66 

117.9 

-587.5 

3.0 

3.0 

-27.02 

187.9 

-1075 

3.5 

3.5 

-35.07 

276.0 

-1786 

8 

4.0 

4.0 

-43.64 

385 

-2751 

mm 

4.5 

4.5 

-52.60 

502.9 

-3992 

EH 

5.0 

5.0 

-61.85 

638.4 

-5519 

As  the  Table  5.1  indicates,  the  WR  algorithm  diverges  for  this  example.  In  fact,  Eqn.  (5.4) 
indicates  that  the  WR  algorithm  will  converge  only  if 


h 

(1  +A) 


< 


_ 1 

v7  0.1  A 


[5.5] 
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The  constraints  on  the  timesteps  for  which  the  global-timestep  discretized  WR  algorithm  will 
converge  is  very  similar  to  the  constraints  on  the  timesteps  for  which  the  relaxation-Newton  algorithm 
applied  to  Eqn.  (3.5)  will  converge(see  Section  3.2).  In  fact,  for  linear  problems  there  is  th(e  following 
comparison  theorem. 


Theorem  5.1:  Let  a  consistent  and  stable  muitistep  integration  algorithm  be  applied  to  an  arbitrary 
linear  system  of  the  form 


Cx(t)  -  Ax«)  x(0)  -  [5.6] 

where  C,A  e  lR"",  C  nonsingular,  and  x(t)  e  1R\  Assume  further  that  the  Gauss-Seidel(Jacobi)  al¬ 
gebraic  relaxation  algorithm  is  used  to  solve  the  linear  algebraic  equations  generated  by  the  inte¬ 
gration  algorithm  (as  described  in  Section  3-2).  Given  a  sequence  of  timesteps,  {/»„,},  the 
Gauss-Seidel(Jacobi)  algebraic  relaxation  algorithm  will  converge  at  every  step,  for  any  initial  guess, 
if  and  only  if  the  global-timestep  discretized  Gauss-Seidel(Jacobi)  WR  algorithm,  generated  by  solv¬ 
ing  the  iteration  equations  with  the  same  multistep  integration  algorithm  and  same  timestep  sequence, 
converges  for  any  initial  guess.  ■ 


Proof  of  Theorem  5.1 

The  algebraic  equations  generated  by  applying  a  multistep  integration  algorithm  to  Eqn.  (5.6) 
is 


k  l 

(=0  (-0 


[5.7] 


or  reorganizing. 
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k  I 

[  C  -  hniP0A]i(Tm)  +  ^ ctjCx  (rm_j)  -  -  0.  [5.8] 

i=l  i=l 

Let  L^D^U'  be  the  strictly  lower  triangular,  diagonal,  and  upper  triangular  portions  of  C.  Similarly, 
let  L^,  Da,  Ua  be  the  strictly  lower  triangular,  diagonal,  and  upper  triangular  portions  of  A.  Using  this 
notation,  the  Gauss-Seidel  relaxation  iteration  equation  applied  to  solving  Eqn.  (5.8)  for  x(t„,)  is 

[  (4  +  Dc)  -  h„MLa  +  Da)  j£*(Tm)  +  [Uc  -  hJ0Ua$k-\rm)  + 

k  A  / 

-  o. 

i=l  <=1 

Taking  the  difference  between  the  k  and  k  -  1  iteration  and  substituting  fi*(rm)  for 
x*(rm)  —  x*_,(0  leads  to 

[  (Lc  +  Dc)  -  hMLa  +  Da)  ]5*(rm)  -  -  [  Uc  -  hnfoUa  ] 5*-,(rm)  [5.9] 

from  which  it  follows  that  the  relaxation  will  converge  at  the  m'h  for  any  inital  guess  if  and  only  if  the 
spectral  radius  of 

[  (Lc  +  Dc)  -  h„MLa  +  Da)  ]-J[  Uc  -  hnfoUa  ]  [5.10] 


is  less  than  one. 

If  the  Gauss-Seidel  WR  algorithm  is  used  to  solve  Eqn.  (5.6),  the  iteration  equation  for  x(r)  is 
(using  the  above  notation), 

(Lc  +  Dc)xk+\t)  +  U/U)  -  (La  +  Da)xk+\t)  +  Ujck{t).  [5.11] 

Applying  the  multistep  integration  algorithm  to  solve  Eqn.  (5.1 1)  for  x*+'  yields 
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[  (4  +  At)  -  KMLa  +  Da)  Arm)  +  [Ue  -  hHfi0Ua^k~\rm)  + 


[5.12] 


k 

^a,{(Lc  +  Dc)xk(T„,_j)  +  Ucxk~\7m_i)]  - 

im  1 


/ 

*^  +  ^Vi)  +  t/1X*‘-1(Tm.i)]  -  0. 

I—  1 


taking  the  difference  between  the  k  and  k  -  1  iteration  leads  to 


[(Lc  +  Dc)  -  h„MLa  +  fla)  ]S*(Tj  +  [£/c  -  + 


[5.13] 


k 

2>,  -  (4  +  4>sV™-,)  +  - 

i-i 

/ 

+  4,>«* (*„-/)  +  -  o. 

i-i 

To  show  that  the  discretized  WR  algorithm  will  only  converge  if  the  algebraic  relaxation  converges, 
let  /  be  a  timestep  for  which  the  spectral  radius  of  the  matrix  in  Eqn.  (5. 10)  is  not  less  than  one.  Use 
as  an  intiai  guess  any  sequence  for  which  the  first  /  -  1  points  are  the  exact  solution  to  the  discretized 
problem.  Then  S*(t„)  —  0  for  m  <  /,  and  Eqn.  (5.13)  is  again  identical  to  Eqn.  (5.9),  and  is  not 
convergent. 

An  inductive  argument  is  used  to  prove  that  if  the  algebraic  relaxation  is  convergent  then  the 
discretized  WR  algorithm  is  convergent.  Assume  that  the  theorem  holds  for  m  <  /  then !5*(r,_,)  will 
go  to  zero  as  k  —  «.  As  this  occurs,  Eqn.  (5.13)  for  the  l"'  step  converges  to  Eqn.  (5  9).  The  alge¬ 
braic  relaxation  converges  and  therefore  the  spectral  radius  of  the  matrix  in  Eqn.  (5.10)  fair  the  /'*  step 
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is  less  than  one.  This  implies  that  Eqn.  (5.9)  represents  a  contraction  mapping  in  some  norm  at  the 
/'*  step,  and  the  results  of  Section  4.3  can  be  applied  to  guarantee  that  Eqn.  (5.13)  converges  at  the 
f*  step.  Note  that  5*(rm)  -  0  for  all  m  <  0,  and  therefore  Eqn.  (5.13)  is  identical  to  Eqn.  (5.9) 
for  m  -  1  which  completes  the  induction.! 

The  above  theorem  holds  for  any  system  of  the  form  of  Eqn.  (2.2)  if  it  is  assumed  that  an  ar¬ 
bitrarily  close  initial  guess  for  each  of  the  relaxation  schemes  is  available.  Although  this  is  not  a  re¬ 
alistic  assumption,  it  does  indicate  that  even  for  nonlinear  systems  the  two  algorithms  present  very 
similar  timestep  constraints  for  a  numerical  integration  method. 

SECTION  5.2  -  GLOBAL  FIXED -TIMESTEP  WR  CONVERGENCE  THEOREM 

It  is  possible  to  generalize  the  proof  of  Theorem  5.1  to  a  proof  for  the  global-timestep 
discretized  WR  algorithm  for  nonlinear  problems  (but,  as  mentioned  above,  the  comparison  to  the 
relaxation-Newton  methods  would  no  longer  hold).  A  different  approach  will  be  taken,  because  the 
approach  followed  in  Theorem  5.1  does  not  prove  the  the  discretized  WR  algorithm  converges  on  a 
fixed  time  interval  as  the  timesteps  become  small. 

To  illustrate  this  point  by  example,  consider  solving  Eqn.  (5.3)  using  explicit-Euler.  The 
recursion  equation  for  the  x$(n)  ’s  is: 


rt—l 

x*+,(n  +  1)  -  (l-/i)x2*+,(«)  -0.1M2[(1  -A)"*f(0)  +  £(1  -  k)n-'-Jxfa)  ] 

j- 1 

The  computed  sequences  {xf+M  ’s  for  the  initial  guess  and  first,  second  and  third  iterations  of  the 
above  equation  are  given  in  Table  5.2,  for  the  case  of  A  -  200,  h  —  0.5  and  x$(nh)  m  nh. 

As  the  table  indicates,  the  explicit-Euler  discretized  WR  algorithm  converges  in  just  the  manner 
predicted  by  Theorem  5.1,  a  step  (or  two)  with  each  iteration.  In  the  same  example,  if  half  the 
timestep  is  used,  similar  results  are  achieved.  That  is,  tire  relaxation  converges  two  steps  with  each 
iteration.  If  it  were  the  case  that  no  matter  how  small  the  timesteps  became,  each  relaxation  iteration 
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TABLE  5.2  -  EXPLICIT-EULER  COMPUTED  WR  ITERATIONS 

STEP 

TIME 

INITIAL 

ITER  #1 

ITER  #2 

ITER  #3 

0 

0 

0 

0 

0 

0 

1 

0.5 

0.5 

0 

0 

0 

2 

1.0 

1.0 

0 

0 

0 

3 

5 

5 

-0.625 

0 

0 

4 

2.0 

2.0 

-1.875 

0 

0 

5 

2.5 

2.5 

3.594 

0.7813 

0 

6 

3.0 

3.0 

-5.625 

3.125 

0 

7 

3.5 

3.5 

-7.852 

7.422 

-0.977 

8 

4.0 

4.0 

-10.19 

13.67 

-4.883 

9 

4.5 

4.5 

-12.61 

21.63 

-13.92 

10 

5.0 

5.0 

-15.06 

30.96 

-29.79 

resulted  only  in  two  more  timesteps  converging,  then  given  a  fixed  interval  of  interest,  the  WR  algo¬ 
rithm  would  not  be  convergent  in  the  limit  as  the  timesteps  approached  zero.  This  is  not  the  case  for 
this  example,  or  in  general  for  the  discretized  WR  algorithm.  If,  for  example,  h  -  0.05  then  the  re¬ 
laxation  converges  in  a  more  uniform  manner,  where  the  value  at  each  timestep  rapidly  approaches 
its  limit  point 

In  Section  4.2,  the  WR  algorithm  was  shown  to  be  a  contraction  mapping,  specifically: 

max[0jje-^  -  x'(t)  D  <  y  max[0  ^e-^Ui*-1(/)  -  x/-1(O0 

where  y,  p  e  1R  are  dependent  on  the  problem,  and  y  <  1.  If  T  is  chosen  small  enough,  then 
yetr  «  y  <  l  and  the  norm  becomes 

max[o,71  1  ~  i'(0»  <  y  maX[0Jl  -  i/-1(/)ll- 

That  is,  the  WR  algorithm  converges  uniformly  over  small  time  intervals  (This  point  will  be  discussed 
further  in  Section  6.2).  The  next  theorem  will  be  an  analogous  proof  for  the  discretized  case.  It  will 
be  shown  that  the  fixed  global  timestep  discretized  WR  algorithm  is  a  contraction  in  a  /J  norm  (the 
technique  was  first  applied  to  proving  discrete  WR  convergence  in  [29]). 
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Formally,  demonstrating  that  the  discretized  WR  is  a  contraction  in  a  /?  norm  implies  the  con¬ 
vergence  of  the  discretized  WR  algorithm  because  of  the  contraction  mapping  theorem.  Intuitively, 
that  the  discretized  WR  algorithm  converges  in  a  J3  norm  implies  an  underlying  uniformity  that  guar¬ 
antees  convergence  over  a  fixed  time  interval  as  the  timesteps  shrink  to  zero.  This  is  the  distinction 
between  Theorem  5.1  and  the  next  theorem. 

Theorem  5.2:  If,  in  addition  to  the  assumptions  of  Theorem  4.1,/in  Eqn.  (2.2)  is  differentiable,  and 
the  WR  iteration  equations  are  solved  using  a  stable,  consistent,  multistep  integration  method  with  a 
fixed  timestep  h,  then  the  sequences  {x*(/>)}  generated  by  the  Gauss-Seidel  or  Gauss-Jacobi 
discretized  WR  algorithm  will  converge  for  all  A  >  O.B 

Before  proving  Theorem  5.2,  some  standard  notation[l,59]  will  be  presented  that  will  also  be 
used  in  the  next  sectioa  The  fixed-timestep  multistep  integration  algorithms  applied  to 

x{t)  -  /(*(/))  x(0)  -  *o,  [5.14] 

where  x:[0,7]  -*  1R",  /: R"  -*•  R"  can  be  represented  by  backward  shift  operators.  That  is,  given 

k  I 

.))  [5.15] 

i»0  i»0 


we  can  define 


and 


p(x(r"'))  - 

im  0 


[5.16*] 


a  (/(x(t"))  - 

i-0 


[5.166] 
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Eqn.  (5.15)  can  then  be  written  compactly  as 

P(x(rm))  -  t517] 

If  it  is  assumed  that  the  operator  p  can  be  inverted,  i.e.  that  x  (r„)  can  be  expressed  as  a 
the  right-hand  side,  then  Eqn.  (5.17)  can  be  written  in  the  form 

x(rm)  -  A„Ip-,o/(x( xm)). 

When  such  and  inverse  of  p  exists,  it  can  be  shown  that  Eqn.  (5.18)  is  equivalent  to 

m 

x(.r„)  ■  +  x(0). 

j~  0 

As  an  example,  consider  implicit-Euler  applied  to  Eqn.  (5.14).  The  usual  form  for  the  discrete 
equations  is* 

x(rm)  -  x{r„,_x )  -  f{x(Tm))  [5.20] 

which  is  in  the  form  of  Eqn.  (5.17).  The  implicit-Euler  discrete  equations  can  also  be  expressed  in 
the  form  of  Eqn.  (5.19), 

[5.21] 

The 

x  (T(,,)  -  ^(Tm_,)  -  0.5[/X.x(Tm))  +  /(x(t„,_,))]. 


m 

x  (rm)  -  ^Ax(rm_j))  +  x(0). 
j-0 

The  solution  to  Eqn.  (5.21)  is  obviously  identical  to  the  solution  to  Eqn.  (5.20). 
(5.17)  for  the  the  trapezoidal  rule  is 


unction  of 


[5.18] 


[5.19] 


[5.22] 


which  can  also  be  expressed  in  the  form  of  Eqn.  (5. 19)  as 
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m—  1 


x(rm)  -  0.5[/(*  (Tm))  +  /WO))]  +  2  +  JtfO). 

j- 1 


5.23] 


The  following  lemma,  a  special  case  of  a  theorem  proof  in  [30],  will  be  the  key  result  u$ed  in 
the  course  of  the  proof  of  Theorem  5.2. 


Lemma  S.l:  Let  H(b)  be  the  map  that  represents  one  iteration  of  the  algebraic  Gauss-Seidel  or 
Gauss-Jacobi  relaxation  algorithm  applied  to  an  equation  system  of  the  form  f\x)  -  b  «  0,  where 
x,b  e  1R",/:IR"  -*  R".  If  /  is  such  that  the  Jacobian  of  /,  ■—  ,  exists  for  all  x,  is  strictly  diagonally 
dominant  uniformly  over  x,  then  H(b)  is  a  contraction  mapping  in  the  /.  norm  and  is  a  Lipscbitz 
continous  function  of  bM 


Proof  of  Lemma  5.1: 

As  usual,  only  the  Gauss-Seidel  case  will  be  proved.  It  will  be  shown  that  if  the  Gauss-{Seide] 
relaxation  algorithm  is  used  to  solve  f[x)  —  b  —  0,  then  the  map  implicitly  defined  by  one  iteration 
of  the  relaxation,  H(b) ,  is  such  that  given  x*,y  e  R" ,  two  arbitrary  points. 


lH{b)xk  -  H(b)y' II.  <  ylx*  -  y I. 


[5.24] 


where  y  <  1. 
Define 


(*i 


*+i 


Y*+l  rk  k.T 


[5.25] 


The  iteration  equation  for  xf+1  is  implicitly  defined  by 
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/l(*l+l- *2 ••••’*«)  -6,-0  [5.26a] 

or,  using  the  above  notation  /,(jr*+u)  -  b  -  0  .  In  the  same  notation,  the  implicit  iteration 
equation  for  .ft  is 

/iO*+1,i)  -  b  -  0.  [5.266] 

Define  the  function  -  /,(/  **+M  +  (1  -  f)/+u)  -  b  where  r  e  [0,1].  Clearly, 

i//(0)  -  ^(1)  -  0.  By  Rolle’s  theorem  there  exists  a  r0  e  (0,1)  such  that 

*'(/<>)  -  o  -  ^¥L{xm'1  +  (l-«b)/+U)(^+W-J'*+U)  [5.27] 

y-i  0xj 

Reorganizing, 


-P~(xk+U1  +  (1  -  /0)/+1J)(arjt+1  -^f+1) - 2*Loc*+W  +  (1  -  /0)/+1,1)(^  -^5.28] 

djcl  >.i  axj 


Dividing  Eqn.  (5.28)  by  — —  ,  which  is  bounded  away  from  zero  by  the  uniform  strict  diagonal 

dxt 

dominance  of  and  using  the  fact  that  I  ■**  -  I  <  H**  —  y*  I! .  by  definition,  we  get 


„  ^-(xk+U  +  (1  -  /0)/+U) 

Ui+1  ~j,i+1 1  -  -2 1-5 - 1  »**-/ 

Jml  -¥-(xk+u  +  (\  -r0)/+U) 


I  U  -/I-  [5-29] 


Using  the  property  of  /  that  the  Jacobian  is  strictly  diagonally  dominant  uniformly  in  x  leads  to 


-yk+i  |  <  nlJr*  -/» 
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[5.30] 

A  similar  argument  can  be  used  to  show 

I  H,(xk)  -  H,{yk)  |  <  yjx*  - 

where  y,  <  1.  Then  if  y  is  chosen  to  be  the  maximum  of  the  y,  ’s,  y  <  1  and 

I H(xk)  -  <  y\xk  -  ykim.  ^.31] 

i 

I 

which  proves  the  first  part  of  the  theorem,  (for  a  more  detailed  proof  of  the  general  cases,  see  [30]). 

That  H  is  a  Lipschitz  continous  function  of  b  can  be  seen  by  examining  the  implicitly  defined 
Hu 

fi(xi  » x2>—> xn)  —  ■  0 

which  is  solved  for  A  simple  application  of  the  implicit  function  theorem[35]  implies  that  if 
dft 

— —  is  bounded  away  from  zero  uniformly  in  x,  then  x,*+1  is  a  Lipschitz  continous  function  of  bt.  The 

dxi 

argument  can  be  carried  inductively  to  show  that  for  each  «,  HJtj  <  i  is  a  Lipschitz  continous  func¬ 
tion  of  b  and  that  therefore  H(b)  is  Lipschitz  continous  with  respect  to  bM 

The  formal  definition  of  the  fi  norm  for  a  sequence  is  given  below. 

DefimHon  5.1:  For  a  sequence  generated  by  a  fixed-timestep  numerical  integration  algorithm, 
1^(0  f .  ^  0  norm  of  the  sequence  is  defined  as 

IWgilj  -  maxw  e-Bhm\\x(r„,) « 


where  y,  <  1.  This  proves  that 

\Hl(xk)  -  H](yk)  |  <  y,  ||jc*  -  /  |.. 


[5.32] 
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where  h  is  the  fixed  timestep  and  B  e  1R.  ■ 


The  following  simple  lemma  will  be  useful  for  the  proof  of  Theorem  5.2. 


Lemma  5.2;  Given  an  arbitrary  sequence,  ,  the  following  inequality  bolds. 


m  —Bhm 

H *  M  _Bhm  lWTJfb 


[5.33] 


where  M  ■  max,  |  y,  |  .■ 


Proof  of  Lemma  5.2; 


The  proof  of  Lemma  5.2  follows  from  a  simple  algebraic  argument  From  Definition  5. 1, 


i-l 


—Bhm 


if2^(v-,)fi 

i-l 


[5.35] 


Using  the  norm  properties,  the  term 


—Bhm 

e 


m 


lf2l'r5r(Tm-i)f  I 

i-l 


[5.36] 


can  be  bounded  by 


e-Bhm^\yj\ 

i-l 


iix*+,(Tm_(.)  n . 


[5.37] 


Inserting  e*”’-'1*  e~m-m~nk  —  1  into  Eqn.  (5.37)  yields 


e-Bhn‘fil  |y,|  eBh<m-i)e-B,’(n‘-')ixk+l(rm_i)l 
i-l 

As 

|^+1(Tjb_))  |  <  H^  +  ,(Tm)}||B, 

Eqn.  (5.38)  leads  to 


e-Shn,^\yj\  eBh(n,-i)  U{x*  +  ,(Tm)}  Ia. 

i-l 

Reorganizing, 


[f>(l  e-Bhi]  8{jf*+1(Tm)}||a. 

i-l 

If  |  y,  |  is  bounded  above  by  M,  then  Eqn.  (5.41)  is  bounded  by 


i-l 

Given  that  e~*hi  is  always  positive,  the  following  inequality  holds, 


fe~Bhi  <  ie-™, 
i-l  i-l 
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(5-38] 


[£.39] 


[5.40] 


[5.41] 


[5.42] 


and  from  the  infinite  series  summation  formula 
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—Bh 


1  -  e 


-Bh 


Using  the  two  in  Eqn.  (5.42)  produces  the  conclusion  of  the  lemma.  ■ 


Proof  of  Theorem  5.2: 

As  before,  only  the  Gauss-Seidel  case  will  be  proved.  In  order  to  insure  charge  conservation, 

the  decomposed  differential  equations  generated  by  the  WR  algorithm  are  solved  using  ch  irge  as  the 

j 

state  variable.  That  is,  the  multistep  integration  algorithm  is  applied  to 


~  [5.43] 

where  xiJ(t),  defined  in  Eqn.  (5.25),  is  usually  the  vector  of  node  voltages.  A  proof  for  the  case 
where  x  is  used  as  the  state  variable  is  given  in  [29],  Applying  the  multistep  integration  algorithm 
using  the  notation  described  above,  and  assuming  hm  ■  h  for  all  m. 


«(rm)))  -  ha  m),  u( t„,))). 


[5.44] 


Solving,  using  the  "inverse"  operator  yields 


«(t„,))  -  hp-'o(f,<ZkJ(Tm),  u(r „,))). 


[5.45] 


Using  the  sum  form  for  p-,o,  and  pulling  out  the  leading  term. 


«(rm))  -  hyQ(f,(xkj(Tm),  - 


[5.46] 
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m 

w(T»'-y)))  +  9/(x(0)M0))  -  o 

j- 1 


Define  F,(x(rm))  as 


F,<*  -  qf<x{rm),u{rm))  -  hy0f^x(Tm),u(Tm))  [5.47] 


and  define  bix^,  k )  •  1R"  by 


A,(V»,  *)  -  +  f,{*(0),«(0))  [5.48] 

7-1 

where  /c  is  used  to  denote  the  fact  that  b  is  a  function  of  x*(r,)  and  x*-'(r,)  for  all  /  <  m.  Then 
Eqn.  (5.45)  is  identical  to  one  iteration  of  the  algebraic  Gauss-Seidel  algorithm  applied  to  solving 

f(x  (rm))  -  b(xpast,  k )  -  0.  [5.49] 

for  x  (tJ.  As  in  Lemma  5.2,  x*+l  can  be  written  in  terms  of  the  map,  H{b{xfaa,  k))  ,  defined  im¬ 
plicitly  by  the  Gauss-Seidel  relaxation  algorithm  applied  to  Eqn.  (5.49), 

x*+1(tJ  -  H(b(xpasl,  k))xk( rm).  [5.50] 

To  prove  that  the  iteration  described  by  Eqn.  (5.50)  is  a  contraction  mapping  on  the  sequence 
l**(0} ,  it  will  be  shown  that  given  two  arbitrary  sequences,  {x*(tJ}  ,  and  [^(tJ], 

rnaxm  e-Bmh^xk+1(rm)  -  ^+1(t„,)||  <  «naxm  e~Binh  Jxfc(rm)  -  ^(rm)»  [5.51] 

where  we  will  use  the  notation 
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m**me-Bmhl$\r „,)»  -  8{x*(t„,)}||b  [5.52] 

To  start,  Eqn.  (5.50)  leads  to  the  following  equation  for  the  difference  between  the  two  se¬ 
quences  at  the  m‘k  step, 

£fc+1(Tm)  -  /+1(rm)  -  H(b{xpasl,  k))x\rj  -  H(b($pas„  k))yk{rm).  [5.53] 

Breaking  into  separate  differences, 

**+1(0  -  J^+1(Tm)  -  mb(xpat,  k))xk(rm)  -  mb(xpast,  k))?(Tm)  +  [5.54] 

H(b(xpeat,  k))yk{rm)  -  H(b(ypast,  k))j^(T„,) 

and  taking  /.  norms, 

-  /+,(TJfl)|.  <  ]  H{b(xpasn  k))xk{ rm)  -  mb(xpasv  k))yk(Tj  !L  +  [5.55] 

I  HWxpast,  k))?( rm)  -  H{b(ypaa,  fc))/(Tm)||. 

At  this  point  we  will  demonstrate  that  for  small  h,  Eqn.  (5.55)  satisfies  the  assumptions  of 
Lemma  5.1.  It  is  assumed  that  the  Jacobian  of  q  with  respect  to  at,  C(x(t),u(t)) ,  is  strictly  diagonally 
dominant  uniformly  in  x.  By  definition,  this  assumption  implies  that  there  exists  ant  >  0  such  that 

|  C«(x(/),  u(t))\  >  *  +  X  I  CyWf),  w(O)  I 

j*i 


[5.56] 


Let  /  >  0  be  the  Lipschitz  constant  of  /  with  respect  to  x.  Assuming  /  is  differentiable,  if  y0  -  0 

(and  therefore  the  method  is  explicit)  or  if  h  <  |  ——  |  ,  then 

Ya  / 


BF  a 

-(x  (t„))  is  strictly  diagonally 


dx(rw) 


dominant 

BF  a 

Assuming  h  is  small  enough  that - (x  (rj)  is  strictly  diagonally  dominant  then  Lemma 

dx(rJ 

5. 1  can  be  applied  to  show 

D H(b(xpasl,  k))xk(rm)  -  H(b(xpasl,  1|„  <  yII^(t„)  -  [5.57] 

for  some  y  <  1  and 

| H(Kxpast,  k))yk( tJ  -  H(b($pml,  k))?( t„)1.  <  [5.58] 

(ff  I  KXpasn  *)  “ 

where  /w  is  the  Lipschitz  constant  of  H  with  respect  to  b. 

Substituting  Eqn.  (5-57)  and  Eqn.  (5.58)  into  Eqn.  (5.55) 

ixk+ ‘(t,,,)  -  ?+ \rm)  0 .  <  y  o £*(rm)  -  I.  +  hHM  mxpasJ,  k)  -  b($pmt,  k)  II _  [5.59] 

where  M  —  max„  I^(tm)  |  _ .  Multiplying  by  e-Sm*  and  taking  the  maximum  over  m 

max,,  e-M^xk+\rJ  -  ;*+,(rm)l.  <  [5.60] 

y  max„,  e~Bmh^xk(rm)  -  ^(t„,)|„  +  \HM  maxnl  e~Bmh  U(xpasv  k)  -  b^pasl,  k)\\„. 


Or,  using  Definition  5.1, 
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The  term  in  Eqn.  (5.60)  Hx^  k)  -  b(yfan 


k)  can  be  expanded  using  the  definition  of  b  in  Eqn. 


(5.48)  to 


-  f,(yk+U(rm_j),  [5.62] 

7-1 


The  /?  norm  of  the  sequence  whose  terms  are  given  in  Eqn.  (5.62)  can  be  bounded  using  Lemma  5.2. 


That  is, 


I  u(r„H))  -  fiyk+1J(Tm_J),  u(rm _,))]}  |a 


[5.63] 


<  hM- 


—Bhm 


—Bhm 


«(?„,))  -  f^U(r\,u{rn 


where  M  is  the  max„  ym.  Using  the  triangle  inequality  and  the  Lipschitz  property  of  /, 


Wh%jWxUU{rm_p,u{rm_j))  -  f&+U 
7-1 


(T»i— y')»  u(rm- /))]}!  B  — 


[5.64] 


—Bhm 


hMl~ 


1  -  e 


-Bhm 


h{**+,(tb1)  - 


^+1(r„, 


ML 


Uxk{rm 


)  - 


y  (T* 


ML 


where  /  is  the  Lipschitz  constant  of  f  with  respect  to  x.  This  bound  can  be  used  in  Eqn.  (5.60)  to  yield 

— Bhm 

(1  "  hMl—L—=BhZ)  »^  +  ,(Tm)  -  /+V„,)}b  <  [5.65] 

1  —  e 


—Bhm 


(y  -  hMl- 


1  —  e 


—Bhm 


)  II  ixk(rm)  -  P(rm)}lB 


P4ge  90 
or 


—Bhm 


U^+\rJ  S 


(y  +  hMt- 


1  —  e 


—Bhm  ' 


—Bhm 


I{x*(t„,)  -  ?(rm)i  na 


[5.66] 


(1  -  hMt- 


1  —  e 


— Bhm 


•) 


as  y  <  1  there  exists  an  and  a  b  >  0  such  that 


—bhm 


(y  +  hMI- 


1  -  e 


—bhm 


) 


—bhm 

(1  _  hMl — - - — ) 


<  1 


1  -  e 


—bhm  ' 


[5.67] 


for  all  h  <  Ao,  which  proves  the  theorem  ■. 

SECTION  53  -  THE  MULTI-RATE  WR  CONVERGENCE  THEOREM 

Theorem  5.1  suggests  that  the  global-timestep  discretized  WR  algorithm  is  not  going  to  be  any 
more  efficent  than  the  well-known  relaxation-Newton  algorithms  described  in  Section  3.2,  as  the 
timestep  constraints  for  the  two  methods  are  identical  for  the  linear  case.  In  fact,  as  Eqn.  (5. 10)  in¬ 
dicates,  WR  is  likely  to  be  less  efficient,  because  decomposition  errors  made  in  the  first  few  timesteps 
propagate  through  the  computations.  That  the  discretized  WR  algorithm  has  proved  to  be  more  ef¬ 
ficient  in  practice  for  some  types  of  problems  is  because  the  discretized  WR  algorithm  is  intrinsically 
a  multi-rate  integration  method.  It  is  because  this  is  the  key  aspect  of  the  WR  algorithm  that  the  rest 
of  this  Chapter  will  be  devoted  to  a  proof  that  the  discretized  WR  algorithm  converges  even  when  the 
timesteps  for  each  subsystem  are  chosen  independently. 

Usually,  choosing  how  to  interpolate  the  discrete  sequence  of  points  produced  by  a  numerical 
integration  method  is  based  only  on  what  will  produce  attractive  graphs  of  the  computed  solution. 
When  multi-rate  integration  methods  are  applied  to  solving  a  system,  interpolation  plays  a  muclj)  more 
significant  role.  If  two  state  variables  in  a  system  interact,  and  they  are  computed  using  different 
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timesteps,  then  to  provide  the  value  of  one  variable  at  the  times  required  to  compute  the  second  var¬ 
iable,  the  first  variable  must  interpolated.  In  the  case  of  WR,  if  the  interpolation  is  not  done  carefully, 
convergence  can  be  destroyed. 

In  this  section,  a  convergence  theorem  for  systems  in  normal  form  will  be  presented  that  dem¬ 
onstrates  the  key  role  of  interpolation  in  the  convergence  of  the  multi-rate  discretized  WR  algorithm. 
The  theorem  guarantees  that  the  discretized  WR  algorithm  is  a  contraction  mapping  assuming  that 
the  points  produced  by  the  numerical  integration  method  are  interpolated  linearly.  As  the  theorem 
proof  will  demonstrate,  the  linear  interpolation  has  one  particular  property  that  aids  convergence. 

Consider  the  following  system. 


x(0  -  Ax(t),u(t)) 


[5.68] 


where  x(r)  -  *„(/)) r,  xfj)  e  R"  ,  u(t)  e  JR"  ,  piecewise  continous,  and 

/  -  (fi(x),...,f,(x))T  ,/:R"  -»  R  is  Lipschitz  continous.  If  the  Gauss-Seidel  WR  algorithm  is  applied 
to  Eqn.  (5.68),  the  iteration  equation  for  x,  is 

i*+V)  -  .ft*, *+V) *‘+V),  *£.,(') . **(/),  «(/))  [5.69] 

If  Eqn.  (5.69)  is  solved  numerically  using  a  multistep  integration  algorithm  with  a  fixed-timestep  h, 
the  iteration  equation  becomes 


p(x,*+I(Tm))  “  haif,<x^+\rm) . ^*+I(T„I),xf+1(rm),...,xlJ:(T„1),  u(r„,))  [5.70] 


If  different  timesteps  are  used  to  solve  the  differential  equations  associated  with  the  x*  vari¬ 


ables,  i  j,  then  Eqn.  (5.70)  makes  no  sense,  because  rm  for  the  /*  equation  may  be  different  than 
t„  for  the  i'*  equation.  In  order  to  even  write  down  the  equations  for  the  multi-rate  case,  some  kind 
of  interpolation  operator  must  first  be  defined. 
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Definition  S.2:  Given  a  finite  sequence  OKOJ  °f  M  elements,  where  ^(t„)  e  1R,  for  all  m  <  M, 
an  interpolation  function  /,{  •  }  on  the  sequence  is  any  function  that  maps  the  sequence  and  the  in¬ 
dependent  variable  t  e  1R,  t  e  [  tq,  ta/  ]  into  1R,  such  that  /,{  •  }  is  continous  with  respect  to  t,  and  that 
I.jbi'j)}  -  }<Tj) 


As  an  example,  the  linear  interpolation  of  a  sequence  at  a  given  time  t  e  [tq,  tw]  would  be 


hi ><Tm)l  -  yirji  +  —7 - -r-b  ~  TP 

*Tj+i  —  V 


[5.71] 


where  j  is  such  that  t;  <  r  <  ry+1. 

In  order  to  write  a  form  of  Eqn.  (5.70)  for  the  multi-rate  case,  we  will  denote  rm  for  the  i"1 
equation  as  t*,.  Using  this  notation  and  the  interpolation  operator  defined  above,  the  unfortunately 
indice-filled  iteration  equation  for  xf  for  the  multi-rate  fixed-timestep  case  is 


P(?+ Vj)  -  V(/UT<  [i'f 


-*+VL)!,.. 


1  M 


. "(O) 


where  A,  is  the  fixed-timestep  for  the  ?*  system.  Using  the  inverse  operator  as  in  Eqn.  (5.25), 


#+1(0  -  v_,«(A/T- {^+1(Ti,)} . /T.  ^Vir1)}, 


[5.72] 


vAi(^‘)i . ^  “(O) 


The  proof  of  Theorem  5.2  demonstrated  that  the  fixed  global-timestep  discretized  WR  algo¬ 
rithm  is  a  contraction  mapping  in  an  /.  /?  norm  on  the  sequence  (see  Definition  5.1).  In  the  multi-rate 
case,  this  is  not  sufficient.  Since  interpolated  as  well  as  sequence  values  are  used  by  subsystems,  a 
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convergence  proof  must  take  into  account  the  effect  of  the  the  interpolation  on  the  sequence.  The 
approach  that  will  be  used  in  the  proof  that  follows  is  to  view  the  multi-rate  discretized  WR  algorithm, 
which  necessarily  includes  an  interpolation  operator,  as  a  map  of  continuous  functions  on  [0,7]  to 
continuous  functions.  The  implicitly  defined  map  can  be  derived  by  applying  the  interpolation  oper¬ 
ator  to  both  sides  of  Eqn  5.72  to  yield 


/,{v~W(/T.  tfr+,(Tj,)i . /.  tfrv*:1)}. 


<**+ i,-i 


A*+l  ,_i 


[5.73] 


To  prove  the  convergence  of  the  relaxation,  the  usual  continuous-time  /J3  norm  can  be  used, 

ixi3  m  max,  |  /,[*,•( tJ,,)}  |  ]  [5.74c] 


or.  equivalently, 

UIb  -  max.[  max[o,r^-B,l 1  1.  [5.74*] 

where  x  is  used  to  denote  the  vector  function  on  [0,7]  defined  by  x(t)  ■* 


Under  certain  conditions  Eqn.  (5.73)  is  a  contraction  map  in  the  p  norm  of  Eqn.  (5.74).  To 
prove  this,  Eqn.  (5.73)  will  be  applied  to  two  sequences  lx*(0}  and  {y^O}  .  The  difference  be¬ 
tween  Eqn.  (5.73)  applied  to  the  two  sequences  is 


[5.75] 


iMp-Wi  . rT^t+i(r‘nl)},  . sji  >* - 


r**  +  l 


J+ 1  \ 


It  is  possible  to  simplify  Eqn.  (S.7S)  by  limiting  the  type  of  interpolation  operators  to  those  that  are 
linear  functions  of  the  sequence.  To  avoid  confusion,  by  this  it  is  not  meant  to  limit  consideration  to 
only  linear  interpolation,  but  to  those  interpolation  functions  for  which 

/,{*(*„,)}  -  l,bi t»,)}  “  -^(tw,)} 

and 


Itiax(rm)}  - 

where  {x(t„)L  1Kt*)}  are  sequences  in  1R,  and  a  e  1R .  For  example,  any  of  the  spline  or  polynomial 
interpolation  operators  are  linear  functions  of  the  sequence.  Exploiting  this  linearity  in  Eqn.  (5.75) 
leads  to 

/,#+1(0  ~  #+1(4)}  “  £5’76] 


P-1"[  (f(  V  . Ir‘  + Vjh  *r‘  . K‘  ~ 

M  M  M  HI 


i  **+!,_! 


t**  +  l  tJ 


(/UT<  lPf+I(ri)! Ir>  1>?+ V',)},  IT‘  . /T'  l£(Ol,  «(4))  ]  }■ 

*  m  1  m  m 


,J+U 


t*k,  n  . 


It  is  possible  to  show  that  the  multi-rate  discretized  WR  algorithm  is  a  contraction  mapping  in 
the  fi  norm  of  Eqn.  (5.74)  if  the  interpolation  operator  is  limited  to  linear  interpolation  (as  in  Eqn. 
(5.71),)  and  the  timesteps  are  made  small  enough. 


Theorem  5.3:  If  the  interpolation  in  Eqn.  (5.73)  is  linear  interpolation,  then  there  exists  a  collection 
of  timesteps  >  0  ,  i  e  such  that  if  0  <  h,  <  h#  for  all  /,  then  the  muiuVrate  fixed- 

timestep  discretized  WR  algorithm  converges  with  the  respect  to  the  interpolated  sequences. 

The  following  simple  Lemmas  will  be  useful  in  the  proof. 

Lemma  5.3:  If  I,{  •  }  is  the  linear  interpolation  operator  (as  in  Eqn.  (5.73)),  then  given  twb  sequences 
{x(rj}  and  if  x(t,)  >  yir,)  for  all  i  then  /,lx(r,)l  >  7,}><rm)}  for  all  t  for  wlhich  the  in¬ 

terpolation  is  defined.  In  addition,  if  «  K,  K  e  IR  ,  for  all  m  <  m,  then  7,{x(tJ}  -  K  for 

t  <  rm. 

Lemma  5.3  follows  directly  from  the  definition  of  linear  interpolation.  As  will  be  shown  in  the  proof, 
this  is  the  key  property  of  linear  interpolation  with  respect  Theorem  5.3. 

Lemma  5.4:  If  I\  •  }  is  the  linear  interpolation  operator  and  {x(rj}  is  a  sequence  in  IR,  then 

Imntf 

max(0,n e'Bt  I  /,{  'Z  \y,\  \  I  H  <  - X—SrM  max[0J1  e~Br  |  /,{ 1  x(rm)  \ }  1 

i-o  1  —  e 

where  M  —  max,  1  y,  1 

The  proof  of  Lemma  5.4  parallels  the  arguments  given  in  the  proof  of  Lemma  5.2,  and  is  omitted. 

Proof  of  Theorem  5.3: 


Expanding  the  p~la  operator  into  its  sum  form  leads  to 
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l—mi 

v,f  S  r/[  Wr(m-oUl+,(Ti)},...,  /f()„.oS^+ Vi,)!. 

i-o 

"<0> 

-  CA/t(„./){^+1(^,)} . Wo#+,<4>1. 

«(%.))  ]  }• 

Using  Lemma  5.3  and  the  Lipschitz  continuity  property  of  f9 

j 

U,#*‘(4>  -  S  (5.77] 

i/, i  -#*'n  +  2  m*  i Wj)$ -#ii]  ii 

/-0  j«  1  j-i+ 1 

where  Z,y  is  the  Lipschitz  constant  of  /  with  respect  to  xr  Reorganizing,  and  exploiting  the  general 
linearity  property  of  the  interpolation  operator  and  the  triangle  inequality. 


I /,#+,(0  -#+VjH  * 


[5.78] 


y-i  /*■*»!, 

Sir, I  IVol?+,-?+,ll  H  + 

jm  1  ,.0 


{—til: 


S  S  !r,l  IV-/>$  -  H 

y=,+i  /=o 
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Multiplying  by  e-*  and  taking  maximums, 

jm  1  1=0 

jmn  /•»»(  ^ 

A<L,ymaX[0,r|*-B,|/ff  2  I  >7  I  Ur(«-/)i*f  ” 

/•/+  1  1=0 


Applying  Lemma  5.4, 


[5-80] 


A/A< 


V  7 

!  ~7^k  ij 


\Zk+i-?+'h 


7=" 


Mhi  , 

1  —  e  0*,+  1 


II**  -  yk  II  b 


where  | **  —  y*  9  a  is  the  /JJ  norm  defined  in  Eqn.  (5.74b). 

For  any  5  >  0  there  exists  a  collection  of  steps  h^},  all  strictly  positive,  and  a 

B  >  0  such  that 


5  > 


j=> 

Mh^Ly 

7=1 

1  -  e-Bh‘ 


[5.81] 


for  all  A,  <  A^,  for  all  i.  Substituting  into  Eqn  5.80, 


A*+l  £*+1, 


,  A/fc  Ak  , 


[5.82] 
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Since  Eqn.  (5.81)  holds  for  ail  i, 

Ijc*+,-^+1Ib  <  S  ixk+1~Pk+iiB  +  [5.83] 


Reorganizing, 

|A*+l_j)*+l«a  <  [5.84] 

8  A 

Clearly,  there  exists  a  5  >  0  such  that  Eqn.  (5.84)  is  a  contraction  (  — - —  <  1).  Let  8  be  that 

1  —  0 

A 

8.  Since  there  exists  a  B  >  0  and  collection  of  h^’s  >  0  such  that  Eqn.  (5.84)  bolds  for  5  -  6 
for  all  0  <  ht  <  h^,  the  theorem  is  proved.  ■ 

Perhaps  the  most  surprising  aspect  of  the  proof  of  Theorem  5.3  is  that  the  ratio  of  the  timesteps 
from  one  system  to  the  next  does  not  seem  to  play  a  role.  This  is  an  extremely  important  observation 
given  that  the  discretized  WR  algorithm  was  developed  to  allow  different  subsystems  to  take  vastly 
different  timesteps.  If  a  large  ratio  between  timesteps  destroyed  the  WR  convergence,  then  the  ap¬ 
plicability  of  the  WR  algorithm  to  multi-rate  problems  would  be  limited. 

A  second  important  observation  is  that  the  only  property  of  linear  interpolation  used  in  the 
course  of  the  proof  was  that  stated  in  Lemma  5.3.  Therefore,  other  interpolations  that  have  this 
property  will  work  as  well.  Higher  order  polynomial  interpolation  functions  do  not  have  the  property 
stated  in  Lemma  5.3,  but  as  they  are  substantially  more  accurate  than  linear  interpolation,  they  are 
extremely  useful.  An  extension  of  the  above  theorem  to  general  polynomial  interpolation  does  not 
seem  to  be  straight-forward,  and  may  call  for  an  entirely  different  approach. 
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CHAPTER  6  -  ACCELERATING  WR  CONVERGENCE 

In  Chapter  7,  several  techniques  used  to  improve  the  efficiency  and  robustness  of  the  WR  al¬ 
gorithm  when  applied  to  simulating  MOS  circuits  will  be  described.  In  this  chapter  the  theoretical 
background  for  two  of  these  techniques  will  be  presented.  We  will  first  analyze  nonunifomiity  in 
WR  convergence,  which  explains  why  breaking  the  simulation  interval  into  pieces,  called  windows, 
can  be  used  to  reduce  the  number  of  relaxation  iterations  required  to  achieve  convergence.  Then  we 
will  consider  how  to  partition  large  systems  into  subsystems  in  such  a  way  that  the  WR  algorithm 
will  converge  rapidly. 

SECTION  6.1  -  UNIFORMITY  OF  WR  CONVERGENCE 

The  convergence  theorem  presented  in  Section  3.2  guarantees  that  the  WR  algorithm  is  a 
contraction  mapping  in  an  exponentially  weighted  norm.  In  this  section,  we  will  examine  the  impli¬ 
cations  of  this  choice  of  norm.  First,  the  WR  algorithm  will  be  applied  to  two  example  problems  to 
exhibit  the  different  manners  in  which  the  algorithm  converges.  We  will  then  prove  that  for  a  special 
class  of  systems  WR  converges  in  a  uniform  manner,  or  formally,  that  WR  is  a  contraction  in  an  un¬ 
weighted  norm  for  any  time  interval.  Because  most  circuit  problems  do  not  generate  systems  in  this 
special  class,  we  will  prove  that  the  WR  algorithm  is  a  contraction  in  an  unweighted  norm  for  any 
system  for  which  the  WR  algorithm  converges,  if  the  time  interval  is  made  short  enough.  This  sug¬ 
gests  that  the  number  of  iterations  required  to  achieve  WR  convergence  can  be  reduced  by  breaking 
the  simulation  interval  into  short  pieces,  and  in  Chapter  7  we  will  present  an  adaptive  algorithm  that 
attempts  to  exploit  this  property  of  WR. 

Consider  the  following  nonlinear  ordinary  differential  equation  in  x,(/),  x2(t)  e  R  with  input 
u  e  R  that  approximately  describes  the  cross-coupled  nor  logic  gate  in  Fig.  6.1a  (the  approiimate 
equations  represent  a  normalization  that  converts  the  simulation  interval  [0,7]  to  [0,1]). 
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ij  -  (5-Xj)  -  Xj(x2)2  ”  Sxji/  [6.1] 

*2  -  (5-x2)  -  x2(xi)2 
jcj(0)  -  5.0  x2(0)  -  0.0 

The  Gauss-Seidel  WR  Algorithm  given  in  Section  1.2  was  used  to  solve  the  for  the  behavior 
of  the  cross-coupled  nor  gate  circuit  approximated  by  the  above  small  system  of  equations.  In  Fig. 
6.1b  plots  of  the  input  «(/),  the  exact  solution  for  x,(/),  and  the  relaxation  iteration  waveforms  for 
x,  (/)  for  the  5th,  10th  and  20th  iterations  are  shown.  The  plots  demonstrate  a  property  typical  of  the 
WR  algorithm  when  applied  to  systems  with  strong  coupling:  the  difference  between  the  iteration 
waveforms  and  correct  solution  is  not  reduced  at  every  time  point  in  the  waveform.  Instead,  each 
iteration  lengthens  the  interval  of  time,  starting  from  zero,  for  which  the  waveform  is  close  to  the 
exact  solution. 

As  an  example  of  "better"  convergence,  consider  the  following  differential  equation  in 
x„  Xj,  x3  with  input  u  that  approximately  describes  the  shift  register  in  Fig.  6.2a  (here  the  simulation 
interval  [0,71  has  been  normalized  to  [0,1]) 

ij  -  (5.0  -  Xj)  -  xj(m)2  -  (xj  -  x2)  [6.2] 


Xj  -  (x,  -  x2) 


x3  -  (5.0  -  x3)  -  x3(x2)2 


x(0)  -  0. 


The  Gauss-Seidel  WR  Algorithm  given  in  Section  1.2  was  used  to  solve  the  original  system  approxi¬ 
mated  by  the  above  system  of  equations.  The  input  u(t),  the  exact  solution  for  x,(/)  ,  and  the 
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waveforms  for  x,(f)  computed  from  the  first,  second,  and  third  iterations  of  the  WR  algorithm  are 
plotted  in  Fig.  6.2b.  As  the  plots  for  this  example  show,  the  difference  between  the  iteration 
waveforms  and  the  correct  solution  is  reduced  throughout  the  entire  waveform. 

Perhaps  surprisingly,  the  behavior  of  the  first  example  is  consistent  with  the  WR  convergence 
theorem,  even  though  that  theorem  states  that  the  iterations  converge  uniformly.  This  is  because  it 
was  proved  that  the  WR  method  is  a  contraction  map  in  the  following  nonuniform  norm  on 

C([0,n  R”): 

max[0ji  e~bl  ||/(f)  || 

where  b  >  0,  f(t)  e  R"  ,  and  ||  •  il  is  a  norm  on  R".  Note  that  \f{J)  0  can  increase  as  e*  without 
increasing  the  value  of  this  function  space  norm.  If  /(/)  grows  slowly,  or  is  bounded,  it  is  possible  to 
reduce  the  function  space  norm  by  reducing  H/(r)  ||  only  on  some  small  interval  in  [0,71,  though  it 
will  be  necessary  to  increase  this  interval  to  further  decrease  the  function  space  norm.  The 
waveforms  in  the  more  slowly  converging  example  above,  converge  in  just  this  way;  the  function 
space  norm  is  decreased  after  every  iteration  of  the  WR  algorithm  because  significant  errors  are  re¬ 
duced  over  larger  and  larger  intervals  of  time. 

The  examples  above  lead  to  the  following  definition: 

Definition  6.1:  A  differential  system  of  the  form  given  in  Eqn.  (2.2)  said  to  have  strict  WR 
contractivity  property  on  [0,71,  if  the  WR  algorithm  applied  to  the  system  contracts  in  a  uniform  norm 
on  [0,7],  Le. 


maxtOJ)  l**+V)  -  **MI  <  max[0,r!  ”  **“V)II  [6.3] 

where  x*(r)  e  R"  on  t  e  [0,7]  is  the  Jfc**  iterate  of  Algorithm  4.1  and  ||  •  H  is  any  norm  on  R".  If  the 
WR  algorithm  applied  to  the  system  is  a  contraction  in  a  uniform  norm  on  [0,7]  for  any  T  >  0  then 
we  say  that  the  system  has  the  strict  WR  contractivity  property  on  [0,  «)  ■ 
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For  a  system  of  equations  to  have  the  strict  WR  contractivity  property  on  [0,  «)  it  must  be 
more  than  just  loosely  coupled.  In  addition,  the  decomposed  equations  solved  at  each  iteration  of  the 
waveform  relaxation  must  be  well-damped,  so  that  errors  due  to  the  decomposition  die  off  in  time, 
instead  of  accumulating  or  growing.  As  an  example,  we  will  prove  that  a  system  in  normal  form, 

*(0  -  f\x(t),  m(0)  *(0)  -  x0  [6.4] 

where  x(t)  e  R"  on  /  e  [0,7];  u(t)  e  IR'  on  t  e  [0,7]  piecewise  continuous;  and  /:  R^R'  -*■  R"  is 
globally  Lipschitz  continuous  will  have  the  strict  WR  contractivity  property  on  [0,7]  for  any  T  <  ac 
if  /  has  a  property  we  refer  to  as  diagonally  dominant  negative  monotonicity.  This  property,  which 
we  define  precisely  below,  just  implies  that  the  original  system  is  loosely  coupled,  and  the  decom¬ 
posed  equations  generated  by  a  WR  algorithm  are  well-damped  (A  similar  result  in  a  different  setting 
can  be  found  in  [61]). 

Definition  6.2:  Let  /(x,  u)  be  a  continuous  map  from  R"xR'  -*  R"  where  x  e  R\  u  •  R'  and  /  is 
globally  Lipschitz  continuous  with  respect  to  x  for  all  u  e  Rr.  /  is  said  to  be  negative  monotone  if 
there  exists  a  positive  number  X  such  that 

(* -y)  •  (fix,  u)  —  fly,  u))  <  -  \(x-y)  •  (x-y)  [6.5] 

(here  •  indicates  a  scalar  product)  for  all  x,y  e  R"  and  u  e  R'.  Let  v  e  R"  be  the  /'*  unit  vector. 
Then  /  is  said  to  be  diagonally  negative  monotone  if  there  exists  a  collection  of  positive  numbers  X, 
such  that 

ev'  •  (f(x  +  tv',  u)  -  f(x,u))  <  -  X,e2  [6.6] 

for  any  positive  e  e  R  ,  x  e  R"  and  u  c  R'.  If  /  is  globally  Lipschitz  continous,  there  exist  positive 
numbers  /fy ,  ij  e  [1, ...,  n]  such  that  for  any  e  c  R 

B  V  •  (fix  +  tv',  u)  -  f{x,u))  II  <  4,1  e  | 


[6.7] 
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for  all  x  •  IR",  u  •  IRr.  A  mapping  /,  is  a  diagonally  dominant  negative  monotone  if  /  is  a  strictly 
diagonally  negative  monotone  and  X,  >  1 L  where  X,  is  as  in  Eqn.  (6.6).  (This  is  a  stricter  definition 

j*‘ 

than  in  previous  literature[30]). 

To  prove  the  theorem  about  diagonally  dominant  negative  monotone  systems  we  will  use  the 
following  lemma. 

Lemma  6.1:  Let  k,  x(t ),  x(t)  e  R  be  such  that 

x(/)x(f)  <  -  Xx(t)x(t )  +  kx(t)  *(0)  -  0  [6-8] 

for  some  positive  number  X.  Then  |  x(t)  |  <  |  k  |  X-1  for  all  t  >  0. 

Proof  of  Lemma  6.1 ; 

Substituting  ~  \  x (f)  | 2  for  2x(/)x (/)  in  Eqn.  (6.8)  and  taking  absolute  values 
at 

-JJ-UM  I2  <  -2XUWI2  +  2|X|  |x(0  I 

Therefore,  ■~\x(t)  I  <  -  X  |  Jc(r)  |  +  |/c  |  or  |x(f)  |  -  0.  This  implies,  by  a  theorem  in  dif- 

at 

ferential  inequalities^  9],  that 


I  x(r)  | 


<  -^(l-e'")  < 


1*1 

X 


which  proves  the  lemma.  ■ 

We  now  prove  the  theorem. 

Theorem  6.1;  Let  a  system  of  equations  of  the  form  of  Eqn.  (6.4)  be  such  that  f(x,u)  is  diagonally 
dominant  negative  monotone.  Then  the  system  has  the  strict  WR  contractivity  property  on  [0,  T\  for 
all  T  <  «  ■ 

Proof  of  Theorem  6.1 : 
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Again  we  will  only  present  the  proof  for  the  Gauss-Seidel  case,  but  the  result  holds  for  the 
Gauss-Jacobi  case  also.  The  iteration  equations  for  the  Gauss-Seidel  WR  algorithm  applied  to  Eqn. 
(6.4)  are,  for  each  i  e  [1, ....  n]. 


■*+i 


-  m 


k  + 1 


r*+l  v*  xk  * 

*1  •••?  “*U  *  M  / 


[6.9] 


where  x,  u,  and  /  are  functions  of  time,  but  the  dependence  on  time  has  been  dropped  for  notational 
convenience.  Taking  the  difference  between  the  k  and  k  +  1  iteration  for  each  i  e  [1, ...,  n]  yields: 


■k+l 


■  k 


xf  ■  -  -xf  -  fj(x*+1 .  x*+1,  x*+l, ...,  x*,  u)  -  ftf, ....  x*.  x*~i, ...,  x*  ',  v) 


„*+l 


jfc-l 


.*-1 


Multiplying  through  by  xf*1  —  xf  and  using  the  Lipschitz  and  diagonal  negative  monotone  properties 
of  /  we  get 


(xk+1  -  xf)  (i*+1  -  x*)  <  -  X,(x*+1  -  xf)(x*+1  -  xf)  + 


[6.10] 


y-i 


|xf+1  -x*|  +  2^1  (xf  -  x/-1)  | 

7-<+l 


i*; 


*+i 


where  |  x,  l  denotes  the  absolute  value  of  x„  and  ltj  and  X,  are  as  in  Definition  6.2.  From  the  estimate 
in  Lemma  6.1, 


\4+1  -*i\  <  Sw1  ljr/+I  -  *j  I  +  2  W1 1 */  - 

j-i  y-i+i 


i— 1 


[6-11] 


Let  A  €  IR""  be  a  matrix  defined  by  A0(i  ft  j)  —  i^X"1  and  -  0.  Then  A  —  L  +  U  where  L 
is  strictly  lower  triangular,  (/  is  strictly  upper  triangular.  Rewriting  Eqn.  (6. 1 1 )  in  matrix  form 

(/  —  L)  |x*+1  —  x*  |  <  U\xk-xk~x\ 


[6.12] 
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where  |  x  |  is  the  vector  whose  elements  are  the  absolute  value  of  the  elements  of  x,  and  the  inequality 
holds  for  each  element-by-element  comparison.  To  show  that  Eqn.  (6.12)  implies 
8(x*+1  -  x*)  || .  <  lx**1  -  x*0„  requires  slightly  complicated  argument,  as  the  inequality  will  not 
still  hold  if  both  sides  of  Eqn.  (6.12)  are  multiplied  by  (7  —  L)~x.  Since  (7  —  L)  is  diagonally  domi¬ 
nant  with  unity  on  the  diagonal  and  negative  lower  triangular  off-diagonal  entries,  if  r  is  a  solution  to 
(/  —  L)  |  r  |  -  U\xk-xk~'  I  then  |r|  >  |x*+1  -  x*  |  •  Given  that  /  is  diagonally  dominant, 
8(7  -  L)~xU\m  <  1  (Lemma  4.2),  from  which  it  follows  that  |x*+1  -  x*|  <  r  <  |x*  -  x*-1 1  . 
Then  from  Eqa  (6.12)  we  get 

maxfoj-]  5xk+1(r)  -  x*(0  B.  <  max[0J1  |x*(r)  -  x*-,(r)  B  [6.13] 

for  any  T  <  «  ,  which  proves  the  theorem.  ■ 

As  the  crossed  nand  gate  example  indicates,  many  systems  of  interest  do  not  have  the  strict 
WR  contr activity  property  on  [0,71  for  all  T  <  «.  However,  we  will  prove  that  any  system  that 
satisfies  the  WR  convergence  theorem  will  also  have  the  strict  WR  contractivity  property  on  some 
nonzero  interval. 

Theorem  6.2:  For  any  system  of  the  form  of  Eqn.  (2.2)  which  satisfies  the  assumptions  of  the  WR 
convergence  theorem  (Theorem  4.1)  there  exists  a  T>  0  such  that  the  system  has  the  strict  WR 
contractivity  property  on  [0,7]. 

Proof  of  Theorem  6.2 

We  prove  the  theorem  only  for  the  Gauss-Seide!  WR  algorithm  but,  as  before,  the  theorem  also 
holds  for  the  Gauss-Jacobi  case.  Starting  with  Eqn.  (4.8)  and  substituting  x*  for  x', 

i*+1(r)-x*(r)  -  [6.14] 


(4+1(r) +  7)*+1(/))-1(4+1(/)x*(/)  -  (Z*(f)  +Dk{t))-iUk(t)xk-\t)  + 
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(4+1(0  +  Ak+1wr,/(**+1.  x*,ii)  -  (4(0  +  xk~\u) 

To  simplify  the  notation,  let  4(0,  4(0  •  1R"”  be  defined  by  Ak(t)  -  (4(0  +  4(0)"'  Uk(t)  , 
4(0  —  (4(0  +  Z>*(0)-1  •  It  is  important  to  keep  in  mind  that  (4(0  +  Dk(t))-'UkU)  ,  and 
(4(0  +  4(0)_1  are  functions  of  x*  ,  and  by  definition,  so  are  Ak(t)  and  4(0-  Expanding  the  above 
equation  and  integrating, 

f(i*+1(T)-i*WWr  -  f^+i(T)(/(T)  -  ^-‘(t))*  +  [6.15] 

J  o  •»  0 

f,[Ak+l(T)-Ak(T)]xk-\T)dr  + 

J  0 

P4+iW(**+I(0,  *V),  t/(T))-/(x*(T),  xk~\r),  u(r))]dr  + 

J  0 

^_1(T).  «(*■»*■ 

Integrating  by  parts  and  using  the  fact  that  x*(0)  -  x*_1(0)  ■  0 , 

x*+1(0  -  xkU)  -  4+i(0  [**(0  -  **_,(0]  -  [6.16] 


f^Aul(r)[xk(r)  -  xk~\r)]dr  +  f'[Ak+  ,(t)  -  Ak(r)]xk~l  dr  + 
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f  -s*+l('r)l/(-,r*+1(T).  **(*),  “(t))  -f(xk(r),  xk~\r),  u(r))]dr  + 

J  0 

/oW+,(r)  -  Bk(,r)]/(Ar).  xk~\r).  u{r))dr 

Taking  norms,  and  using  the  Lipschitz  continuity  of  /,  Ak(t)  ,  and  Bk(t),  and  the  uniform  boundedness 
of  Bk(t)  in  x  (see  Theorem  4.1): 

|x*+,(/)-JC*(0l  -  f(/,/:  +  kiA/  +  k3^)IIx*+J(r)-x*(T)S  <  C6.17] 

J  0 


y  |  x\t)  -  xk~\t)  |  +  ( 12K  +  kxM  +  2 k2M  +  kAN )  »x*(t)  -  xk~\r )  Jrfr 

J  0 

A 

where  ^ ,  4  are  the  Lipschitz  constants  of  /  with  respect  to  x**1  and  x*  respectively; 
kt,  ki,  k3,  k4  are  the  Lipschitz  constants  for  Ak^(t),  Bk+tU)  with  respect  to  their  ***,  and  xk  ar- 
guments  respectively;  y  -  max^^L*  +  Dk)  Uk 3  <  1  ;  and  M  and  N  are  the  a  priori  bounds  on  xk 

A 

and  /  found  in  the  proof  of  Theorem  4.1.  Note  that 

d  d  d  **  *** 

-  "dxM  +  ~£ZAk* i(T)-**  <  k\M  +  kjM  .  Moving  the  max  (  over  t  ) 

norms  outside  the  integrals  and  integrating  yields 

max[0ji  IU*+V)  -  **(0  8  <  [6.18] 


Y  +T(12K  +  kxM  +  2k2M  +  k4N) 
1  -  r(/,tf  +  kxM  +  k3N) 


max[0,71  H  (■**(')  ~  Xk  1(/))  IJ- 
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Since  y  <  1,  a  T*  >  0  exists  such  that 
With  this  f,  Eqn.  (6.17)  becomes 


y  +  f  (l2K  +  kjM  +  2 k-M  +  ktN) 
1  -  7Vitf  +  kxM  +  k)N) 


a  <  1  ■ 


maX[0t  j']  ||.r*+1  -  x*  |  <  a  max[0  ^  ||x*  -  xK~'  | 


*- li 


[6.19] 


for  a  <  1,  which  proves  the  theorem.  ■ 

Theorem  6.2  guarantees  that  the  WR  algorithm  is  a  contraction  mapping  in  a  uniform  norm  for 
any  system,  provided  the  interval  of  time  over  which  the  waveforms  are  computed  is  made  small 
enough.  This  suggest  that  the  interval  of  simulation  [0,7]  should  be  broken  up  into  windows, 
[0,  7,],  [T„  TJ, ....  [r„_i,  7*J  where  the  size  of  each  window  is  small  enough  so  that  the  WR  algo¬ 
rithm  contracts  uniformly  throughout  the  entire  window.  The  smaller  the  window  is  made,  the  faster 
the  convergence.  However,  as  the  window  size  becomes  smaller,  the  advantages  of  the  waveform 
relaxation  are  lost.  Scheduling  overhead  increases  when  the  windows  become  smaller,  since  each 
subsystem  must  be  processed  at  each  iteration  in  every  window.  If  the  windows  are  made  very  small, 
timesteps  chosen  to  calculate  the  waveforms  are  limited  by  the  window  size  rather  than  by  the  local 
truncation  error,  and  the  advantages  of  the  multi-rate  nature  of  WR  will  be  lost 

The  lower  bound  for  the  region  over  which  WR  contracts  uniformly  given  in  Theorem  6.2  is 
too  conservative  in  most  cases  to  be  of  direct  practical  use.  As  mentioned  above,  in  order  for  the 
WR  algorithm  to  be  efficient  it  is  important  to  pick  the  largest  windows  over  which  the  iterations  ac¬ 
tually  contract  uniformly,  but  the  theorem  only  provides  a  worst-case  estimate.  Since  it  is  difficult  to 
determine  a  priori  a  reasonable  window  size  to  use  for  a  given  nonlinear  problem,  window  sizes  are 
usually  determined  dynamically,  by  monitoring  the  computed  iterations(See  Chapter  7)[18].  Since 
Theorem  6.2  guarantees  the  convergence  of  WR  over  any  finite  interval,  a  dynamic  scheme  does  not 
have  to  pick  the  window  sizes  very  accurately.  The  only  cost  of  a  bad  choice  of  window  is  loss  of 
efficiency,  the  relaxation  still  converges. 
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SECTION  6.2  -  PARTITIONING  LARGE  SYSTEMS 

In  Algorithm  4.1,  the  system  equations  are  solved  as  single  differential  equations  in  one  un¬ 
known,  and  these  solutions  are  iterated  until  convergence.  If  this  kind  of  node-by-node  decompos¬ 
ition  strategy  is  used  for  systems  with  even  just  a  few  tightly  coupled  nodes,  the  WR  algorithm  will 
converge  very  slowly.  As  an  example,  consider  the  three  node  circuit  in  Fig.  6.3a,  a  two  inverter  chain 
separated  by  a  resistor-capacitor  network.  In  this  case,  the  resistor-capacitor  network  is  intended  to 
model  wiring  delays,  so  the  resistor  has  a  large  conductance  compared  to  the  other  conductances  in 
the  circuit  The  current  equations  for  the  system  can  be  written  down  by  inspection  and  are: 

Cxx  +  imX(xx,vdd)  +  im2(xi,u)  +  g(xx  -  x2)  -  0  |]6.20] 


Cx2g(x2-x1)  -  0 


Gxji„^(jr3,x2)  +  im4(xv  vdd)  -  0 

Linearizing  and  normalizing  time  (so  that  the  simulation  interval  [0,71  is  converted  to  [0,1])  yields  a 
3x3  linear  equation: 
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*l(0)  -  x2(0)  -  0  *3(0)  -  5 

Algorithm  4.1  was  used  to  solve  the  original  nonlinear  system.  The  input  u(t),  the  exact  sol¬ 
ution  for  x,,  and  the  first  fifth  and  tenth  iteration  waveforms  generated  by  the  WR  algorithm  jfor  x2 
are  plotted  in  Fig.  6.3b.  As  the  plot  indicates,  the  iteration  waveforms  for  this  example  are  converg- 
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mg  very  slowly.  The  reason  for  this  slow  convergence  can  be  seen  by  examining  the  linearized  sys¬ 
tem.  It  is  clear  xl  and  x2  are  tightly  coupled  by  the  small  resistor  modeling  the  wiring  delay. 

If  Algorithm  4. 1  is  modified,  so  that  xt  and  x2  are  lumped  together  and  solved  directly,  we  get 
the  following  iteration  equations: 
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*+1  k+l 
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[6.22] 


The  modified  WR  algorithm  now  converges  in  one  iteration,  because  x3  only  depends  on  the  "block" 
of  jr,  and  jr2,  and  that  block  is  independent  of  x3  . 

As  the  example  above  shows,  lumping  together  tightly  coupled  nodes  and  solving  them  directly 
can  greatly  improve  the  efficiency  of  the  WR  algorithm.  For  this  reason,  the  first  step  in  almost  every 
WR-based  program  is  to  partition  the  system,  to  scan  all  the  nodes  in  the  system  and  determine  which 
should  be  lumped  together  and  solved  directly.  Partitioning  "well"  is  difficult  for  several  reasons.  If 
too  many  nodes  are  lumped  together,  the  advantages  of  using  relaxation  will  be  lost,  but  if  any  tightly 
coupled  nodes  are  not  lumped  together  then  the  WR  algorithm  will  converge  very  slowly.  And  since 
the  aim  of  WR  is  to  perform  the  simulation  rapidly,  it  is  important  that  the  partitioning  step  not  be 
computationally  expensive. 

Several  approaches  have  been  applied  to  solve  this  partitioning  problem.  The  first  approach 
is  to  require  the  user  to  partition  the  system[15].  This  technique  is  reasonable  for  the  simulation  of 
large  digital  integrated  circuits  because  usually  the  large  circuit  has  already  been  broken  up  into  small, 
fairly  independent  pieces  to  make  the  design  easier  to  understand  and  manage.  However,  what  is  a 
sensible  partitioning  from  a  design  point  of  view  may  not  be  a  good  partitioning  for  the  WR  algorithm. 
For  this  reason  programs  that  require  the  user  to  partition  the  system  sometimes  perform  a  "sanity 
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check"  on  the  partitioning.  A  warning  is  issued  if  there  are  tightly  coupled  nodes  that  have  not  been 
lumped  together. 

A  second  approach  to  partitioning,  also  tailored  to  digital  integrated  circuits,  is  the  functional 
extraction  method[16].  In  this  method  the  equations  that  describe  the  system  are  carefully  examined 
to  try  to  find  functional  blocks  (i.e.  a  nand  gate  or  a  flip-flop).  It  is  then  assumed  that  nodes  of  the 
system  that  are  members  of  the  same  functional  block  are  tightly  coupled,  and  are  therefore  grbuped 
together.  This  type  of  partitioning  is  difficult  to  perform,  since  the  algorithm  must  recognize  broad 
classes  of  functional  blocks,  or  nonstandard  blocks  may  not  be  treated  properly.  However,  the 
functional  extraction  method  can  produce  very  good  partitions  because  the  relative  importance  of  the 
coupling  of  the  nodes  can  be  accurately  estimated. 

Since  it  is  the  intent  of  the  partitioning  to  improve  the  speed  of  convergence  of  the  relaxation, 
it  is  sensible  to  partition  a  large  circuit  with  this,  rather  than  topology  or  functionality,  in  mind.  In  this 
section  we  will  develop  an  algorithm  based  on  this  idea.  As  it  is  difficult  to  get  estimates  of  the  speed 
of  WR  convergence  directly.  We  will  start  with  an  exact  analysis  of  a  relaxation  algorithm  applied  to 
a  simple  2x2  linear  algebraic  example,  and  then  lift  the  result  to  a  heuristic  for  partitioning  large  linear 
algebraic  problems.  Then  a  relationship  will  be  established  between  the  convergence  speed  of  the 
linear  WR  algorithm,  and  that  of  two  linear  algebraic  problems. 

The  following  definition  will  be  useful  for  describing  the  rate  of  convergence  of  relaxation  al¬ 
gorithms. 

Definition  6.3:  Let  x*  e  1R"  be  generated  by  the  k'k  iteration  of  an  algebraic  relaxation  algorithm  ap¬ 
plied  to  a  system  of  the  form  j\x)  -  0  ,  where  x  e  IR"  and  /:IR"  •*  R” .  Then  the  /_  iteration  factor 
y.  is  defined  as  the  smallest  positive  number  such  that 

Uk+l-xklm  <  7m  lx4-**"1!. 


for  any  k  >  0,  and  any  bounded  initial  guess  x°  ■. 
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Since  Che  difference  between  the  exact  solution,  x,  and  the  result  of  the  k'k  step  of  a  relaxation, 
x*,  is  less  than  (y.  )*  \x  —  x°  || .  .the  size  of  y_  is  an  indication  of  how  fast  the  relaxatiomconverges. 
If  y.  is  much  less  than  1  then  the  relaxation  is  certain  to  converge  rapidly,  but  if  ya  >  1  the  relaxa¬ 
tion  may  not  converge,  and  if  y.  is  close  to  1  the  convergence  may  be  very  slow. 

Consider  the  simple  2x2  matrix  problem. 
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[6.23] 


If  the  Gauss-Jacobi  relaxation  algorithm  is  used  to  solve  Eqn.  (6.23)  (See  Section  3.2)  then  the  /.  it¬ 
eration  factor  is  the  /„  induced  norm  of 
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which  is 
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[6.246] 


and  if  the  Gauss-Seidel  relaxation  algorithm  is  used,  then  the  7.  iteration  factor  is  the  /,  induced  norm 
of 
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which  is 
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For  the  2x2  linear  system  of  Eqn.  (6.23),  Eqn.  (6.24b)  and  Eqn.  (6.25b)  can  be  used  to  decide 
whether  to  use  relaxation,  or  lump  the  two  nodes  together  and  use  direct  methods.  The  critera  that 
be  small  (much  less  than  one),  which  we  will  refer  to  as  the  diagonally  dominant  loop  criteria,  has 
proved  to  be  a  useful  heuristic  for  partitioning  the  large  systems  generated  by  circuit  problems.  For 
the  linear  algebraic  problem 


Ax  b  [6.26] 

where  x  -  (*„..,  x,)T,  b  <■  (6,,..,  b„)T  ,  x„  bt ,  e  IR"  ,  A  e  1R""  ,  invertible,  A  «■  (atJ)  ,  we  have  the 
following  partitioning  algorithm. 

Algorithm  tf.l  Diagonal  Dominant  Loop  Partitioning  for  Ax  -  b 
for  all  ( ij  in  N )  { 

fl if 

if  (  -  „  >  a  )  {  x,  is  lumped  with  x  } 

ai flu 

} 

■ 

The  constant  a  is  dependent  on  the  problem,  and  is  roughly  related  to  the  desired  lm  iteration  factor, 
so  the  smaller  a  is  made,  the  more  likely  nodes  are  to  be  lumped  together. 

Although  Algorithm  6.1  works  well  for  the  matrices  generated  by  a  wide  variety  of  circuit 
problems,  it  is  only  a  hueristic.  There  are  circuit  examples  for  which  the  diagonally  dominant  loop 
criteria  does  not  indicate  tightly  coupled  nodes  that  should  be  placed  in  the  same  partition.  A  par¬ 
ticularly  common  circuit  example  for  which  Algorithm  6. 1  does  not  lump  tightly  coupled  nodes  to¬ 
gether  is  given  in  Fig.  6.4,  an  inverter  driving  a  series  of  resistors.  This  is  just  a  more  complex  version 
of  the  example  given  at  the  beginning  of  this  section.  The  KCL  equations  for  the  circuit,  approxi¬ 
mating  the  inverter’s  output  as  a  one  volt  voltage  source, 


0.01*i  +  10.0(*i  -  x2)  -  0.01 


Page  115 


10.0(x2  -  *1)  +  1.0(x2  -  x3)  «  0 
1.0(x3  -  x2)  +  10.0(x3  -  x4)  -  0 
0.01x4  +  10.0(x4  -  x3)  —  0 

or  in  matrix  form, 

10.01  -10.0  0.0  0.0 
-10.0  11.0  -1.0  0.0 
0.0  -1.0  11.0  -10.0 
0.0  0.0  -10.0  10.01 

If  Algorithm  6.1  is  used  to  partition  the  matrix  in  Eqn.  (6.27)  and  a  -«  0.1,  then  x,  will  be  lumped 
with  Xj  and  x3  will  be  lumped  with  x4.  The  spectral  radius  for  the  iteration  matrix  generated  by  ap¬ 
plying  block  Gauss-Seidel  relaxation  to  the  partitioned  subsystems  is  =  0.98.  The  spectral  radius  is  a 
lower  bound  on  the  iteration  factor  in  any  norm.  Since  it  is  very  close  to  one,  the  relaxation  will 
converge  slowly. 

The  reason  the  diagonally  dominant  loop  criteria  sometimes  produces  misleading  results  is  that 
it  is  too  local  a  criterion,  it  only  indicates  how  mutually  coupled  two  nodes  are,  compared  to  how 
coupled  they  are  to  other  nodes  in  the  problem.  If  two  nodes  are  extremely  tightly  coupled  as  are  the 
pairs  x„  x2  and  x3,  x4  in  the  example  of  Eqn.  (6.27),  then  each  of  the  nodes  in  the  pair  will  appear 
relatively  loosely  coupled  to  other  nodes  in  the  problem,  even  if  they  are  tightly  enough  coupled  to 
other  nodes  to  slow  the  relaxation. 

It  is  possible  to  modify  the  diagonally  dominant  loop  partitioning  algorithm  so  that  it  will 
produce  good  partitions  for  problems  which  contain  subsystems  like  the  example  of  Eqn.  (6.27).  To 
demonstrate  the  algorithm,  we  consider  a  different  approach  to  partitioning.  Consider  a  problem  of 
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the  form  of  Eqn.  (6.26),  Ax  -  b  -  0  ,  and  define  X,.  -  — f  ,  which  is  just  the  f*  diagonal  term  of 

dbj 

of  A~K  Then  new  algorithm  is  generated  by  replacing  -J-  with  A„  in  Algorithm  6.1. 

Algorithm  6.2  -  Reduced  System  Partitioning  for  Ax  -  b 
for  all  ( i  in  n  )  { compute  A„ } 
for  all  (  ij  in  N )  { 

if  (  aip]^j  >  a  )  { x,  is  lumped  with  Xj  j 

} 

■ 

A  simple  circuit  interpretation  can  be  given  for  the  two  partitioning  algorithms  based  on 
Norton  equivalents[36].  Using  the  diagonally  dominant  loop  criteria  directly  to  decide  whether  or 
not  to  lump  node  x2  with  x3  amounts  to  examining  a  circuit  for  which  the  elements  to  the  right  of  x2 
and  to  the  left  of  x3  have  been  replace  with  a  current  source  in  parallel  with  a  0.1  ohm  resistor  to 
ground.  Using  the  reduced  system  partitioning  algorithm  amounts  to  using  the  exact  equivalent  for 
the  circuit  in  Fig.  6.4,  that  is,  to  replace  the  elements  to  the  right  of  x2  and  to  the  left  of  x3  with  their 
Norton  equivalent,  a  current  source  in  parallel  with  a  100. 1  ohm  resistor  to  ground.  Then  diagonally 
dominant  loop  test  applied  to  this  reduced  system  indicates  that  ym  a  0.98  ,  and  is  identical  to  the 
spectral  radius  computed  above. 

Of  course,  computing  the  inverse  of  A  is  a  foolish  approach  to  partitioning  if  the  problem  is  to 
compute  a  matrix  solution  by  relaxation.  It  is  a  useful  notion  though,  because  there  are  many  cases 
where  reasonable  approximations  to  A,  can  be  computed  easily,  as  we  will  demonstrate  in  Chapter  7. 

Either  the  diagonally  dominant  loop  or  the  reduced  system  criteria  are  heuristic  techniques  for 
partitioning  linear  algebraic  systems.  The  next  step  is  to  lift  the  techique  to  an  approach  for  parti¬ 
tioning  the  differential  systems  of  the  form  of  Eqn.  (2.2). 


Cx(t)  -  Ax(t)  +  u(f)  x(0)  -  x0 


[6.28] 
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where  C,  A  e  R"",  C  nonsingular,  and  x(t)  e  R".  We  will  start  by  presenting  the  waveform  equiv¬ 
alent  of  the  iteration  factor. 

Definition  6.4:  Let  .r*:[0,73  ■*  R"  be  the  function  generated  by  the  k,k  iteration  of  the  WR  algorithm 
applied  to  a  system  of  the  form  of  Eqn.  (6.28).  Then  the  WR  /,  uniform  iteration  factor,  y*'R  ,  for 
the  system  is  defined  as  the  smallest  positive  number  such  that 

max[oi71Hx*+1(0-**(OIL  £  max[0J1  II  jc*(/)  -  xk~\t)  ||M 

for  any  k  >  0,  any  continously  differentiable  initial  guess  v°,  and  any  piecewise  continuous  input 

«.■ 

There  are  two  ways  to  reduce  y**.  The  first,  discussed  in  the  Section  6.1,  is  to  reduce  the 
simulation  interval  [0,71  until  y*'*  is  less  than  one.  The  second  approach  is  to  partition  the  circuit  into 
loosely  coupled  subsystems.  A  combination  of  the  two  techniques  is  needed  to  allow  for  reasonably 
large  windows  and  reasonable  small  partitions. 

As  mentioned  above,  it  is  difficult  to  estimate  y**  directly  for  a  given  problem  of  the  form  of 
Eqn.  (6.28).  There  are  the  following  theorems  which  relate  y?*  to  iteration  factors  applied  to  a 
simplified  system  of  equations. 

Theorem  6.3:  Let  y**  be  the  WR  uniform  iteration  factor  for  a  given  system  of  equations  of  the  form 
of  Eqn.  (6.28)  solved  on  [0,T].  Then  in  the  limit  as  T  —  oe,  y™  is  bounded  below  by  the  spectral 
radius  of  (L,  +DJ~tUm  where  L„,Da,  Ua  are  the  strictly  lower,  diagonal,  and  strictly  upper  triangular 
portions  of  A  given  in  Eqn.  (6.28). 

The  theorem  is  simple  to  prove  given  the  following  lemma,  the  proof  of  which  is  given  in  [32]. 

Lemma  6.2:  Let  F  be  any  linear  map  such  that_y  ■»  Fx,y,  jr:[0,  oe)  -*  R" .  Define  >’(s),  x(s),  F{s)  as 
the  Laplace  tranforms  of  y,  x,  and  F  respectively.  Then  the  spectral  radius  of  the  map  F,  p(F)  is  equal 
to  the  max,  p(F(s))M 
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Proof  of  Theorem  6.3 

Let  4  D„  Uc  be  the  strictly  lower  triangular,  diagonal,  and  upper  triangular  portions  of  C. 
Similarly,  let  4.  D„,  U„  be  the  strictly  lower  triangular,  diagonal,  and  upper  triangular  portions  of  A. 
Using  this  notation,  the  Gauss-Seidel  WR  iteration  equation  applied  to  solving  Eqn.  (6.28)  is 

(Lc  +  DcKk+'(r)  +  u/{t)  -  (La+Da)xk+l(t)  +  U^U).  [6.29] 

Define  e*(f)  ■  x —  jr*-,(r).  Taking  the  difference  between  the  k  +  T*  and  k‘h  iteration  of  Eqn. 
(6.29)  yields 


(Lc  +  Dc)kk+\t)  +  u/(t)  -  (La  +  Da)'k+\t)  +  u/(t).  [6.30] 

Taking  the  Laplace  transform  of  Eqn.  (6.30)  yields, 

s(Lc  +  Dc)tk+\s)  +  sU/(s )  -  (4  +  Da)ck+\s)  +  Ugtk(s).  [6.31] 

Reorganizing,  assuming  the  diagonal  elements  of  C  are  nonzero, 

e*+1W  -  [5(4  +  4>  +  (4  +  Da)V\sUc  +  Ua)ek(s),  [6.32] 

from  which  it  can  be  seen  that 

F(5)  -  [s(Lc  +  Dc)  +  (4  +  Da)]-\sUc  +  ua)  [6.33] 


In  particular, 

F(0)  -  [La  +  DaTlUa 


which,  given  Lemma  6.2,  proves  the  theorem.B 
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Theorem  6.4:  Let  y'*'*  be  the  WR  uniform  iteration  factor  for  a  given  system  of  equations  Of  the  form 
of  Eqn.  (2.1).  Then  y*R  is  is  bounded  below  by  the  spectral  radius  of  (Lc  +Z>c)-,£/c  wheire  Uc 
are  the  strictly  lower,  diagonal,  and  strictly  upper  triangular  portions  of  C  given  in  Eqn.  (6-28). 

Proof  of  Theorem  6.4 

Algebraically  reorganizing  Eqn.  (6.30), 

e*+V) - (Lc  +  DcrlUcik(t)  -  [6.35] 

(4  +  Z)cr,(La  +  Z)>*+,(/)  +  (Lc  +  Dc)~iu/(t). 

Integrating  Eqn.  (6.35)  and  using  the  fact  that  e(0)  -  0, 

e*+V)  -  -{Lc  +  DJ-'u/W  -  [6.36] 

f'(4  +  Dc)~\La  +  Da)ek+\r)dr  +  f‘(Lc  +  D^u/^dr. 

J  o  •'0 

Since  Eqn.  (6.36)  holds  for  all  t,  it  holds  as  t  0,  which  proves  the  theorem.  ■ 

In  Eqn.  (6.28),  C  represents  the  matrix  of  linear  capacitors,  and  A  is  the  net  circuit  currents 
generated  by  conductances.  The  two  theorems  above  indicate  that  it  is  possible  to  get  lower  bound 
estimates  of  y*'*  by  examining  circuits  where  only  the  capacitances  and  conductances  are  independ¬ 
ently  present  These  estimates  are  lower  bounds,  hence,  to  decrease  y*R  below  a  desired  a,  it  is  nec¬ 
essary  to  partition  in  such  a  way  that  the  iteration  factors  for  the  Gauss-Seidel  iteration  applied  to  the 
algebraic  systems  are  decreased  below  a. 
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Figure  6.1b  -  WR  Iteration  from  Cross-Coupled  Nor  Gate 
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Figure  6.2b  -  WR  Iterations  from  Shift  Register 
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Figure  6.3b  -  \VR  Iterations  from  Inverter  with  Delay 
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Figure  6.4  -  Inverter  Driving  a  Series  of  Resistors 
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CHAPTER  7  -  THE  IMPLEMENTATION  OF  WR  IN  RELAX2.3 

In  this  Chapter,  a  description  of  the  implementation  of  the  WR  algorithm  in  the  RELAX2.3 
program  is  given.  We  start  with  a  brief  overview  of  the  steps  performed  in  the  RELAX2.3  program 
when  simulating  a  circuit  A  detailed  description  of  the  major  steps  is  contained  in  the  sections  that 
follow. 

The  first  step  in  simulating  a  circuit  using  the  RELAX2.3  program  is  to  create  the  circuit  de¬ 
scription  file.  In  this  file  a  user  must  specify  device  model  parameters,  circuit  topology,  analysis 
specifications,  and  plotting  requests.  The  circuit  topology  can  be  described  in  as  hierarchical  or  flat 
a  form  as  the  user  desires[60].  This  circuit  description  file  is  used  as  an  input  to  the  RELAX2.3 
program,  whose  first  step  is  to  flatten  the  hierarchy. 

Before  applying  the  WR  algorithm,  the  flattened  circuit  is  decomposed  into  a  collection  of 
subcircuits.  This  is  done  by  partitioning  the  circuit  into  clusters  of  tightly  coupled  nodes.  Then  the 
elements  (e.g.  transistors,  resistors,  capacitors)  that  connect  to  any  of  the  nodes  in  a  given  cluster  are 
gathered  together  to  make  the  subcircuits.  Once  the  entire  circuit  has  been  carved  up  into  subcircuits, 
the  subcircuits  are  ordered,  or  scheduled,  starting  with  subcircuits  that  are  connected  to  the  user- 
defined  inputs  and  then  following  the  natural  directionality  of  the  circuit  (as  much  as  possible). 

After  a  large  circuit  has  been  broken  up  into  subcircuits,  and  these  subcircuits  have  been  or¬ 
dered,  the  RELAX2.3  program  begins  the  waveform  relaxation  process.  An  initial  guess  is  made  for 
each  of  the  node  voltage  waveforms.  Then  the  numerical  solution  for  each  of  the  subcircuits  is 
computed  in  the  order  determined  above.  The  computation  is  performed  using  a  variable-timestep 
trapezoidal  rule  numerical  integration  algorithm,  with  local  trunction  error  timestep  control):  1].  To 
perform  the  numerical  integration,  those  nodes  in  the  subcircuit  that  where  not  part  of  the  cluster 
around  which  the  subcircuit  was  built  are  treated  as  external  time-varying  voltage  sources.  The  values 
for  the  external  voltage  sources  are  either  the  initial  guess  waveforms,  or  if  the  subcircuit  containing 
the  external  node  was  simulated  previously,  that  computed  waveform.  As  the  node  waveforms  are 
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computed,  they  replace  the  existing  waveforms  (initial  guesses  or  previous  iterations),  and  the  process 
is  repeated  until  the  waveforms  converge. 

As  mentioned  in  Chapter  6,  the  WR  algorithm  becomes  inefficient  when  used  to  simulate  dig¬ 
ital  circuits  with  logical  feedback(e.g.  finite  state  machines,  ring  oscillators,  etc.)  for  many  cycles, 
because  the  relaxation  converges  in  a  very  nonuniform  manner.  For  this  reason  the  RELAX2.3 
program  does  not  actually  perform  the  relaxation  iterations  by  computing  the  transient  behavior  of 
each  subcircuit  for  the  entire  user-defined  simulation  interval.  Instead,  the  RELAX2.3  program  uses 
a  modified  WR  algorithm[17],  in  which  the  relaxation  is  only  performed  for  a  small  piece  of  the 
user-defined  simulation  interval  at  a  time.  Exactly  how  large  a  piece  of  the  waveform,  referred  to  as 
a  as  a  window  to  use  is  determined  automatically,  at  the  beginning  of  every  WR  iteration. 

If  the  WR  algorithm  applied  to  very  large  circuits,  it  is  often  the  case  that  some  pieces  of  the 
circuit  will  converge  much  more  rapidly  than  others.  This  phenomenon,  called  partial  waveform 
convergence,  can  be  exploited  to  improve  the  overall  efficiency  of  the  WR  method.  The  details  of  the 
algorithm  for  avoiding  recomputing  the  waveforms  that  have  already  converged  are  given  in  Section 
7.5. 

As  a  final  point,  in  Chapter  5  it  was  mentioned  that  when  the  WR  iteration  equations  are  solved 
using  a  numerical  integration  algorithm,  the  resulting  discretized  WR  algorithm  is  not  guaranteed  to 
converge  unless  the  discretization  error  is  driven  to  zero  with  the  iterations.  For  this  reason,  the 
RELAX2.3  program  reduces  the  acceptable  local  truncation  error  criteria  used  for  selecting  the  nu¬ 
merical  integration  timesteps  as  the  iterations  in  a  given  window  progress. 

SECTION  7.1  -  PARTITIONING  MOS  CIRCUITS 

As  was  shown  in  Section  6.2,  the  convergence  of  WR  is  greatly  accelerated  if  groups  of  tightly 
coupled  nodes  are  solved  together  as  one  subsystem  or  subcircuit.  For  this  reason  the  RELAX2.3 
program  groups  together  tightly  coupled  nodes  into  subcircuits  before  beginning  the  relaxation  proc¬ 
ess.  The  algorithms  used  in  the  RELAX2.3  program  to  partition  large  MOS  circuits  is  based  on  Al- 
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goritfam  6.2  for  partitioning  linear  algebraic  systems,  and  Theorems  6.3  and  6.4  that  relate  the 
problem  of  partitioning  linear  algebraic  systems  to  partitioning  linear  differential  systems. 

MOS  circuits  are  not  linear,  so  the  ideas  presented  in  Section  6.2  must  be  modified  if  they  are 
to  be  applied  to  nonlinear  systems.  The  RELAX2.3  program  uses  several  conservative  heuristics 
(conservative  in  the  sense  that  they  tend  to  error  on  the  side  of  producing  larger  than  optimal  sub- 
circuits)  to  handle  the  nonlinear  MOS  transistors.  The  first  heuristic  is  that  each  of  the  MOS  tran¬ 
sistors  is  initially  treated  as  a  nonlinear  resistor  between  the  transistor’s  source  and  drain,  and  the 
coupling  between  the  gate  and  source  and  gate  and  drain  is  considered  separately,  during  scheduling 
(See  Section  7.2).  With  this  simplification,  the  following  algorithm  for  partitioning  circuits  with 
two-terminal  linear  and  nonlinear  resistances  is  applied. 


Algorithm  7,1  -  (Conductance  Partitioning) 
for  each  (  conductive  element  in  the  circuit )  f 

g3  maximum  element  conductance  over  all  v. 

Remove  the  element  from  the  circuit. 

Replace  each  of  the  other  conductances  in  the  circuit  by  its  minimum  values  over  all  v. 
Compute  g,  and  g2,  the  Norton  Equivalent  conductances  at  the  element  terminals 
ft  ft 

If  ( - >  a  )  { Here,  a  is  the  desired  WR  iteration  factor,  typically  0.3 

(ft  +  ft)  (ft  +  ft) 

Tie  the  two  terminal  nodes  together. 

} 

i 


Computing  the  Norton  equivalent  conductances,  Geq,  at  a  node  can  be  performed  using  a 
simple  recursive  formula  if  there  are  no  loops  of  conductances  among  only  non-voltage  source  nodes. 
Note  that  this  recursion  will  not  be  very  deep.  The  recursion  will  stop  at  any  MOS  transistor,  because 
the  minimum  conductance  of  the  MOS  transistor  is  zero. 


Algorithm  7.2  -  (Norton  Equivalent  Conductance  for  Node  0 
Geq  0.0 

foreach  (  conductive  element  incident  at  node  i )  { 

G  ■*-  element  conductance 

nodej  ■*-  the  conductive  element’s  other  node. 
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If  ( nodej  is  a  voltage  source  node  )  { 

Geq  Geq  +  G 

} 

else  | 

Geqj  *■  Norton  equivalent  conductance  at  nodej  with  this  element  removed. 
Geq  «*-  Geq  +  (G  x  Geqj)/(G  +  Geqj) 

I 

} 


If  the  circuit  does  contain  conductance  loops  among  only  non-voltage  source  nodes,  the  above  algo¬ 
rithm  can  still  be  used  if  the  recursion  is  truncated  in  such  a  way  that  no  circuit  node  is  visited  twice. 
In  this  case,  only  an  estimate  of  the  Norton  equivalent  will  be  computed. 

The  conductance  partitioning  algorithm  is  justified  by  Theorem  6.3,  that  the  WR  iteration 
factor  is  bounded  below  by  the  iteration  factor  for  solving  just  the  algebraic  portion  of  the  problem. 
Theorem  6.4  suggests  that  an  analogous  algorithm  to  Algorithm  7. 1  be  constructed  for  the  capacitive 
elements  in  the  circuit  Since  the  capacitance  problem  is  almost  identical  in  nature  to  the  conductance 
problem,  a  capacitance  partitioning  algorithm  can  follow  almost  the  same  strategy  as  the  conductance 
partitioning.  The  difference  is  that  instead  of  comparing  floating  capacitances  to  Norton  equivalent 
conductances,  they  are  compared  to  equivalent  capacitances.  These  equivalent  capacitances  are  en¬ 
tirely  analogous  to  the  equivalent  conductances,  and  can  be  computed  using  the  same  recursive  ap¬ 
proach  as  in  Algorithm  7.2. 

The  RELAX2.3  program  uses  both  conductance  and  capacitive  partitioning,  and  forms  sub¬ 
circuits  from  the  union  of  the  two  results.  The  algorithm  has  been  applied  to  a  wide  variety  of  MOS 
digital  circuits,  including  a  large  VHSIC  memory  circuit  with  2900  nodes  and  over  3500  parasitic 
components.  The  results  have  always  matched  the  best  attempts  at  hand  partitioning,  in  as  mainy  in¬ 
stances  as  we  had  the  patience  to  check.  However,  it  is  likely  that  if  the  method  is  applied  to  larger 
problems,  the  subcircuits  produced  may  become  quite  large.  Should  this  be  the  case,  the  present 
simple  algorithm  could  be  extended,  so  that  an  additional  pass  is  made  over  only  the  excessively  large 
subcircuits,  to  subpartitioning  them  using  more  sophisticated  algorithms.  In  particular,  to  use  better 
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estimates  of  the  equivalent  conductances  and  capacitances,  as  the  present  algorithm  may  be  unnec¬ 
essarily  conservative. 

SECTION  7.2  -  ORDERING  THE  SUBSYSTEM  COMPUTATION 

When  applying  the  Gauss-Seidel  WR  algorithm  to  a  decomposed  system  of  differential 
equations,  the  order  in  which  the  equations  are  solved  can  strongly  effect  the  number  of  WR  iter¬ 
ations  required  to  achieve  satisfactory  convergence.  In  order  to  explain  this  effect,  consider  the  case 

« 

where  there  are  only  grounded  two-terminal  capacitors  for  each  node  of  the  circuit  Thus,  the  matrix 
C(x,u)  of  Eqn.  (2.2)  is  diagonal.  Then  let  the  dependency  matrix  of  /  in  Eqn.  (2.2)  be  defined  as  a 
zero-one  matrix  P  «  [pi;]  such  that ptJ  -  1  if  f  depends  on  xp  p ,,  -  0  otherwise.  Note  that  P  also 
represents  the  zero-nonzero  structure  of  the  Jacobian  of  /. 

If  P  is  lower  triangular,  then  one  iteration  of  the  Gauss-Seidel  WR  algorithm  will  produce  the 
exact  solution  to  the  original  differential  equation  system  (in  practice,  two  iterations  will  be  per¬ 
formed  because  a  second  iteration  is  needed  to  verify  that  convergence  has  been  achieved).  If  P  is 
not  lower  triangular,  but  the  dependence  of  the  f  component  of  /  on  xp  i  <  j,  is  "weak",  then  the 
result  of  one  iteration  of  the  Gauss-Seidel  WR  algorithm  will  be  close  to  the  exact  solution,  and  sub¬ 
sequent  iterations  will  converge  rapidly.  For  this  reason,  when  applying  relaxation  techniques  to  the 
solution  of  circuit  equations,  the  technique  can  be  made  much  more  efficient  by  reordering  the 
equations  to  make  P  as  close  to  a  lower  triangular  matrix  as  possible. 

As  discussed  in  Section  6.2,  subsets  of  nodes  in  a  large  circuit  may  be  mutually  tightly  coupled, 
and  in  order  to  insure  that  the  relaxation  algorithm  converges  rapidly  when  applied  to  such  a  circuit, 
these  subsets  are  grouped  together  into  subcircuits  and  solved  with  direct  methods.  This  corresponds 
to  block  relaxation  method,  and  an  ordering  algorithm  applied  to  a  system  being  solved  with  block 
relaxation  should  attempt  to  make  /  as  block  lower  triangular  as  possible. 

In  some  sense,  partitioning  and  ordering  the  subsystem  of  equations  are  performing  similar 
functions.  They  are  both  attempting  to  eliminate  slow  relaxation  convergence  due  to  tw<>  nodes  in  a 
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large  circuit  being  tightly  coupled.  There  is,  however,  a  key  difference.  If,  for  example,  x,  is  strongly 
dependent  on  xt  and  Xj  is  strongly  dependent  on  x, ,  then  a  partitioning  algorithm  should  lump  the  two 
nodes  together  into  one  subcircuit.  However,  if  xj  is  strongly  dependent  on  xJt  but  Xj  is  weakly  de¬ 
pendent  on  x,  then  node  i  and  node  j  should  not  be  lumped  together,  but  the  ordering  algorithm 
should  insure  that  the  system  is  block  lower  triangular  by  ordering  the  equations  so  that  jr;  is  computed 
before  computing  xt. 

Resistors  and  capacitors  do  not  exhibit  the  kind  of  unidirectional  coupling  that  is  of  concern 
to  the  ordering  algorithm.  In  fact,  the  only  element  type  of  concern  to  the  ordering  algorithm  are 
transistors,  because  they  exhibit  unidirectional  coupling.  That  is,  the  drain  and  source  terminals  of 
an  MOS  transistor  are  strongly  dependent  on  the  gate  terminal  of  the  transistor,  but  the  gate  is  almost 
independent  of  the  drain  and  source.  Clearly,  this  implies  that  the  subcircuits  containing  the  given 
transistor’s  drain  or  source  should  be  analyzed  after  the  subcircuit  containing  the  given  transistor’s 
gate. 

To  devise  an  algorithm  to  carry  out  this  task,  it  is  convenient  to  introduce  the  dependency  graph 
of  the  partitioned  circuit  If  we  represent  the  circuit  with  a  directed  graph  G(X,  E) ,  where  the  set  of 
nodes,  X,  is  in  one-to-one  correspondence  with  the  subcircuits  obtained  by  a  partitioner,  and  where 
there  is  a  directed  edge  between  the  node  corresponding  to  subcircuit  i  and  the  node  corresponding 
to  subcircuit  j  if  there  is  a  transistor  whose  gate  is  in  subcircuit  i  and  whose  drain  or  source  is  in  sub¬ 
circuit  j.  If  the  graph  is  acyclic,  it  can  be  levelized,  i.e.  all  the  nodes  can  be  ordered  in  levels  so  that 
a  node  in  level  i  can  have  incoming  edges  only  from  nodes  in  levels  lower  than  i.  The  ordering  so 
obtained  is  the  one  used  by  RELAX2.3  to  process  the  subcircuits. 

However,  there  may  be  cases  where  cycles  exist  in  the  graph.  In  this  case,  either  the  subcircuit 
defintions  are  changed  by  grouping  two  or  more  subcircuits  together,  effectively  performing  part  of 
the  partitioning  task(As  aluded  to  in  Section  7.1),  or  edges  of  the  graph  are  discarded  to  remove  the 
cycles.  In  either  case,  at  the  end  of  this  process  an  acyclic  graph  and  an  ordering  of  the  subcircuits 
corresponding  to  the  leveling  of  the  (perhaps  altered)  graph  is  obtained. 
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One  question  remains,  which  is  when  to  repartition  to  remove  a  feedback  loop  versus  breaking 
the  loop.  As  the  example  Section  6. 1  indicates,  if  signal  propagation  around  the  feedback!  loop  is  fast 
compared  to  the  size  of  the  window,  the  relaxation  convergence  will  be  slow  and  nonuniform..  For 
this  reason,  the  ordering  algorithm  makes  the  decision  about  partitioning  based  on  an  estimate  of  the 
delay  around  the  feedback  loop.  If  it  is  smaller  than  one  percent(somewhat  arbitrarily  chosen)  of  the 
simulation  interval,  the  feedback  loop  is  removed  by  repartitioning.  If  the  delay  is  larger,  then  the 
feedback  loop  is  broken  by  removing  an  edge  from  the  directed  graph. 


Algorithm  7.3  -  (Relax2.3  Subcircuit  Ordering  Algorithm) 

Initialization. 
ordered _ list  —  NULL; 

unordered _ list  -  List  of  subcircuits  from  the  partitioner; 

Main  Loop. 

while  (  unordered _ list  *  NULL )  { 

none _ ordered  — —  FALSE; 

while  ( none _ ordered  — —  FALSE )  { 

none _ ordered  — —  TRUE; 

for  each  (subcircuit  in  the  unordered _ list)  { 

if  (all  subcircuits  on  incoming  arcs  are  on  ordered _ list)  { 

none _ ordered  -  FALSE; 

append  to  end _ of _ ordered _ list(subcircuit); 

delete _ from _ unordered _ list(subcircuit); 

} 

} 

if  (  unorder _ list  #  NULL )  { Must  be  a  feedback  loop. 

found  loop  —  FALSE; 
depth  —  1; 

while  (  found _ loop  —  —  FALSE  )  I 

depth  —  depth  +  1; 

for  each  ( subcircuit  in  the  unordered  list )  { 
if  ( there  exists  a  loop  of  length  -  depth  )  { 
found _ loop  —  TRUE; 

if  (  delay  around  the  loop  >  0.01  *  the  simulation  interval )  { 
break  the  loop 

else  j 

collapse  loop  into  one  subcircuit. 

\ 

\ 

} 


SECTION  7.3  -  COMPUTATION  OF  THE  SUBSYSTEM  WAVEFORMS 

As  in  standard  circuit  simulators,  the  RELAX2.3  program  solves  Eqn.  (4.4)  using  a  numerical 
integration  method  with  varying  timesteps.  Since  the  major  aim  of  the  RELAX2.3  program  is  to 
simulate  digital  circuits,  the  integration  method  was  chosen  based  on  how  effectively  it  solves  prob¬ 
lems  with  the  properties  of  digital  circuits.  Digital  circuits  are  very  stiff,  therefore  only  A-stable  in¬ 
tegration  methods  were  considered.  In  addition,  digital  circuits  contain  very  rapid  transitions,  and  low 
order  one-step  integration  methods  are  usually  suggested  for  such  problems.  Although  the 
Backward-Euler  method  is  computationally  the  simplest  A-stable  one-step  method,  the  trapezoidal 
rule,  an  A-stable  second-order  one-step  method,  was  chosen  instead  because  of  its  better  accuracy. 

There  is  a  second  important  reason  for  chosing  the  trapezoidal  integration  algorithm  over  the 
implicit-Euler  formula.  If  the  WR  algorithm  is  used  to  solve  the  system,  and  a  numerical  integration 
method  is  used  to  solve  the  WR  iteration  equations,  then  the  upper  bound  on  the  timestep  to  guar¬ 
antee  WR  convergence  (see  Chapter  5)  is  a  function  of  the  integration  method.  This  timestep  con¬ 
straint  is  larger  for  the  trapezoidal  rule  than  for  implicit-Euler.  To  show  this,  consider  the  simple  case 
of  the  WR  algorithm  applied  to  Eqn.  (2.2)  with  C(v,u)  -  I,  that  is  assume  all  the  capacitors  are  lin¬ 
ear,  grounded  and  unity.  The  WR  iteration  equations  become 

v*+1  -  v*  +  /(v*+1,  v*,u).  [7.1] 

where  /  is  as  defined  in  Section  4.2.  Now  consider  computing  the  first  time  step  of  the  implicit-Euler 
discretized  WR  algorithm: 

v*+1(A)  -  %  -  (v*(/»)  -  n,)  +  /(v*+1(n  +  1),  v*(n  +  1),  u).  [7.2] 


Applying  the  trapezoidal  rule  yields: 
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v*+1(A)  -  vq  -  (v*(/»)  -  v0)  +  0.5/i/(/+1(/j  +  1),  vk(n  +  1),  u )  +  0.5A/i(v0,  u).  [7.3] 

The  reason  for  the  relaxation  iteration  is  to  resolve  f  (v*+1(n  +  1),  v*(/t  +  1),  u)  ,  and  it  plays  a 
smaller  role- in  Eqn.  (7.3)  than  in  Eqn.  (7.2),  and  therefore  the  iteration  of  Eqn.  (7.3)  will  achieve 
and  given  convergence  threshold  faster. 

Given  a  timestep  h,  the  trapezoidal  integration  method  applied  to  Eqn.  (4.4)  yields: 

« 

q(t  +  h)  -  q(t)  -  0.5h[f(q(t  +  h),u)  +  /(*(/),  «))  -  0  [7.4] 

The  above  equation  is  a  nonlinear  algebraic  equation  in  q.  The  user  is  usually  more  interested 
in  the  voltage,  so  before  solving  Eqn.  (7.4)  we  substitute  for  q  in  terms  of  v. 

?(v(/  +  A))  -  q(v{t))  -  0.5 h[f(v(t  +  h),u)  +  /(v(/),  u)  ]  -  0  [7.5] 

In  Eqn.  (7.5)  v(r)  and  q(t)  are  known,  and  the  equation  must  be  solved  to  compute  v(r  +  h). 
Nonlinear  algebraic  systems  generated  by  integration  methods  are  usually  solved  using  the  iterative 
Newton-Raphson  method.  This  is  because  Newton  methods  have  quadratic  convergence  properties 
and  because  they  are  guaranteed  to  converge  if  the  initial  guess  is  close  enough  to  the  correct  solution. 
The  general  Newton-Raphson  iteration  equation  to  solve  F(x)  —  0  is 

Jf{xk)  ( xk  -  X*"1) - /-(X*'1)  [7.6] 

where  Jr  is  the  jacobian  of  F  with  respect  to  x.  The  iteration  is  continued  until  ||x*  -  x*-'  ||  <  e  and 
F(x)  is  close  enough  to  0.  If  the  Newton  algorithm  is  used  to  solve  Eqn.  (7.5)  for  v(t  +  h),  the  resi¬ 
due,  /"(/(/  +  h)),  is: 

F(vk(t  +  h))  -  qivk(t  +  h))  -  q(v(0)  -  0.5/t(  Av*(*  +  h),u )  +/(v(f),  u)  )  [7.7] 


and  the  Jacobian  of  F(v*(r  +  ft)),  +  h))  is: 
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Jf(Vk{t  +  h)) 


C(vk(t  +  h),u)  +  O.Sh- 


df{vk(t  +  h),  u) 
dv 


Then  v*+1(f  +  h)  is  derived  from  v*(f  +  h)  by  solving  the  linear  system  of  equations: 

Jf{vkU  +  h))[vk+i(t  +  h)  -  vk(t  +  h)]  .  -F(vk(t  +  h))  [7.8] 

Tbfe  Newton  iteration  is  continued  until  sufficient  convergence  is  achieved,  that  is 
I  v*+,(f  +  h)  -  v*(f  +  h)  0  <  e  and  F(v*(f  +  h ))  is  close  enough  to  zero. 

Each  iteration  of  the  Newton  algorithm  requires  a  function  evaluation,  a  Jacobian  evaluation, 
and  a  matrix  solution.  For  the  algebraic  systems  generated  by  the  numerical  integration  of  MOS 
digital  circuits  it  is  often  inefficient  to  evaluate  the  Jacobian  every  Newton  iteratioa  If  the  Jacobian 
is  reevaluated  only  every  few  Newton  iterations[27],  the  number  of  iterations  required  to  achieve 
convergence  is  usually  unchanged  and  the  computation  required  is  significantly  reduced.  Not  only 
are  Jacobian  evaluations  skipped,  but  if  the  matrix  solution  is  computed  by  LU  factorization[40], 
subsequent  matrix  solutions  using  the  same  matrix  can  skip  the  LU  factorization  step.  In  the 
RELAX2.3  program  the  Jacobian  is  evaluated  every  third  iteration,  this  choice  based  on  experimental 
evidence  in  several  examples  given  in  the  table  below. 


TABLE  7.1  -  CPU  TIME  VS  #  OF  NEWTON  ITERATIONS/JACOBIAN  EVALUATION 

Circuit 

Devices 

1 

2 

3 

4 

Ring  Osc. 

7 

0.95s 

0.77s 

0.71s 

0.75s 

Oper.  Amp 

25 

6.28s 

5.2s 

4.52s 

4.67s 

flip-flop 

33 

20.47s 

16.82s 

13.93s 

13.67s 

Cmos  Memory 

621 

1080s 

976s 

885s 

886s 

On  Vaxl  1/780  running  Unix 

The  integration  method  used  in  the  SPICE2  program  is  very  similar  to  the  direct  method  used 
in  RELAX2.3.  Both  use  the  trapezoidal  integration  formula  with  local  truncation  error  tinfestep 


control,  the  Newton  method  to  solve  the  algebraic  system,  and  sparse  LU  factorization  to  perform  the 
matrix  solution.  However,  as  can  be  seen  from  Table  7.2,  the  RELAX2.3  program,  using  the  direct 
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method  described  above,  is  eight  to  twenty  times  faster  than  the  SPICE2  program.  This!  can  be  at¬ 
tributed  to  many  factors.  The  first  is  that  RELAX2.3  is  written  in  "C",  SPICE2  is  in  FORTRAN, 
and  "C"  programs  under  the  UNIX  operating  system  run  almost  a  factor  of  two  times  faster  than 
FORTRAN  programs.  The  other  factor  of  four  to  ten  is  due  to  more  sophisticated  programming 
techniques,  the  more  efficent  equation  formulation  and  the  modified  Newton  method  j  mentioned 
above,  and  better  numerical  integration  error  control. 


TABLE  7.2  -  RELAX23  (DIRECT)  VS  SPICE  ON  INDUSTRIAL  CIRCUITS 

Circuit 

Devices 

SPICE2 

RELAX2.3 

Ratio 

Ring  Osc. 

7 

17s 

0.75s 

22 

Op-amp 

25 

42s 

5s 

8 

uP  Control 

232 

1400s 

90s 

15 

Cmos  Memory 

621 

10400s 

995s 

10 

4-bit  Counter 

259 

4300s 

540s 

8 

Encode-Decode 

1326 

115,840s 

5000s 

23 

On  Vaxl  1/780  running  Unix 


It  should  be  pointed  out  that  without  a  fundamentally  new  circuit  simulation  method,  just  by 
carefully  exploiting  some  very  general  properties  of  MOS  digital  circuits,  almost  an  order  of  magni¬ 
tude  decrease  in  computation  time  has  been  achieved  over  the  much  more  general  SPICE2  program. 


SECTION  7.4  -  WINDOWSIZE  DETERMINATION 

As  mentioned  in  Section  6.2,  the  WR  algorithm  used  in  RELAX2.3  becomes  inefficient  when 
used  to  simulate  digital  circuits  with  logical  feedback(e.g.  finite  state  machines,  ring  oscillators,  etc.) 
for  many  cycles.  However,  the  WR  algorithm  can  still  be  very  efficient  if  the  relaxation  is  only  per¬ 
formed  on  a  piece  of  the  waveform  to  be  computed  at  a  time.  For  general  circuits,  an  ideal  situation 
would  be  to  break  the  simulation  interval  into  windows  over  which  every  time  point  of  the  iteration 
waveform  moves  closer  to  the  correct  solution.  However,  if  the  windows  are  too  small  some  of  the 
advantages  of  waveform  relaxation  are  lost.  One  cannot  take  advantage  of  a  digital  circuit’s  natural 
latency  over  the  entire  waveform,  but  only  in  that  window;  the  scheduling  overhead  increases  when 
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the  windows  become  smaller,  as  each  circuit  lump  must  be  scheduled  once  for  each  window;  aind  if 
the  windows  are  made  very  small,  timesteps  chosen  to  calculate  the  waveforms  will  be  limited  by  the 
window  size  rather  than  by  the  discretization  error,  and  unnecessary  calculations  will  be  performed. 

Rather  than  use  a  conservative  a  priori  lower  bound  as  given  in  Theorem  6.2,  in  the  RELAX2.3 
program,  the  "windowsize"  is  determined  dynamically,  by  two  criteria.  The  first  criterion  is  to  pick 
the  windowsize  to  limit  the  number  of  timepoints  required  to  represent  each  node  waveform  in  a 
window.  This  puts  a  strict  upper  bound  on  the  amount  of  storage  needed  for  the  waveforms,  and  thus 
allows  the  RELAX2.3  program  to  avoid  dynamically  managing  waveform  storage  space.  The  second 
criterion  is  to  try  to  pick  the  windowsize  so  that  the  convergence  of  the  WR  is  rapid,  in  particular,  that 
the  waveforms  approach  the  correct  solution  in  a  uniform  manner  over  the  entire  window.  The 
RELAX2.3  program  presently  uses  the  following  windowsize  determination  algorithm: 

Algorithm  7.4  (RELAX2.3  Windowing  Algorithm) 
starttime  m  Beginning  of  the  window 
stoptime  —  End  of  the  window 
endtime  —  End  of  user-defined  simulation  interval 
usedpts  —  Max.  #  of  points  used  in  the  last  window 
maxpts  —  Max.  #  of  points  in  a  waveform  buffer 
prevwindow  •*  Size  of  the  window  used  in  the  previous  iteration 
if  (  Not  entirely  converged  in  this  window )  then  { 

if  (  usedpts  >  maxpts )  then  { 

Shorten  window  if  the  waveforms  overran  storage  buffers. 
stoptime  —  starttime  +  (prevwindow  *  maxpts  *  0.7)/usedpts; 

else  if  ( (numiters  mod  5)  0 )  then  {  Half  windowsize  every  five  WR  iterations. 

stoptime  —  prevwindow/2  +  starttime; 

else  {  Else  just  do  the  same  window  again. 
stoptime  —  starttime  +  prevwindow; 

} 

else  { 

starttime  —  stoptime; 

stoptime  —  starttime  +  (prevwindow  •  maxpts  •  0.7)/usedpts; 
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At  present,  one  twentieth  of  the  simulation  interval  is  being  used  as  an  initial  guess  for  the 
windowsize.  Adding  a  simple  critical  path  analyzer  to  RELAX2.3  is  being  considering  to  provide  a 
better  initial  guess. 

SECTION  7.5  -  PARTIAL  WAVEFORM  CONVERGENCE 

If  the  WR  algorithm  is  used  to  compute  the  time  domain  behavior  for  very  large  circuits,  it  is 
often  the  case  that  some  pieces  of  the  circuit  will  converge  much  more  rapidly  than  others.  The 
overall  efficiency  of  the  WR  method  can  be  improved  if  the  waveforms  that  have  already  converged 
are  not  recomputed  every  subsequent  iteration. 

To  take  advantage  of  partial  waveform  convergence  requires  a  simple  modification  to  Algo¬ 
rithm  4. 1.  Before  giving  the  exact  algorithm  we  present  the  following  useful  definition. 

Definition  7.1:  Let 


n 


<t))v.(t) 
j- 1 


fi(v(t),  u(t))  v'(0) 


v<0 


[7.9J 


be  the  P*  equation  of  the  system  in  Eqn.  (2.2).  We  say  v;(/)  is  an  input  to  this  equation  if  there  exists 

J»  l» 

some  a,  t  e  1R  and  zy  e  IR"  such  that  2  C,Xz,  u(t))y,  *  2  C,Xz  +  ae„  u(t))y,  or 

y*l  J  J  /•!  J  J  J 

f(z,  u(t))  #  f(z  +  aep  «(/)).  where  e,  is  the  /*  unit  vector.  The  input  set  for  the  /'*  equation  is  the  set 
of  j  €  [1, ...,  n]  such  that  v;(0  is  an  input  ■. 

The  WR  algorithm  is  then  modified  slightly  using  this  notion  of  the  set  of  inputs  to  a  given 


ODE. 

Algorithm  7.5  -  WR  Algorithm  with  Partial  Waveform  Convergence 

The  superscript  k  denotes  the  iteration  count,  the  subscript  i  denotes  the  component  index  of  a 
vector  and  e  is  a  small  positive  number,  k  0 
guess  waveform  x°(0  ;  t  e  [0 ,T]  such  that  jr°(0)  -  x0 
(for  example,  set  x°(t)  -  x&  t  e  [0,7] ) 

repeat  { 

k  *  k  +  1 
foreach  ( i  in  N)  [ 

Partial  flag  -  TRUE 
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if  ( Ac  —  1 )  Partialflag  —  FALSE 
For  each  (j<  i,j  •  input  set  of  v, ) 

if  ( max,orl  |  v*  -  v^~l  |  >  e  )  Partialflag  -  FALSE 
For  each  (j>  i,j  c  input  set  of  y ) 

if  (  maXj^j-]  |  vf-»  —  v*-2 1  >  e  )  Partialflag  -  FALSE 
if  ( Partialflag  -  TRUE)  v**1  -  v* 
else  solve 

2.  Q/vf. . *fti' . «)v?  + 

/■* 

y-|.l  C^(V* . ^  . V*"1'  “K"1  + 

/(vf, ....  yf,  ifci1 . tf"1,  w)  -  0 

for  (  if(r) ;  t  •  [0,7]  ),  with  the  initial  condition  vf(0)  - 


} 

I  until  (  max,SlS„  max,«to  r]  |  v*(t)  -  tf-'(t)  1  <  e  ) 
that  is,  until  the  iteration  converges. 


SECTION  7.6  -  EXPERIMENTAL  RESULTS 

The  degree  to  which  the  WR  algorithm  improves  circuit  simulation  efficiency  can  be  traced  to 
two  properties  of  a  circuit  The  first  mentioned  before,  is  the  differences  in  the  rates  of  change  of 
voltages  in  the  system,  as  this  will  determine  how  much  efficiency  is  gained  by  solving  the  subsystems 
with  independent  integration  timesteps.  The  second  is  the  amount  of  coupling  between  the  subsys¬ 
tems.  If  the  subsystems  are  tightly  coupled,  then  many  relaxation  iterations  will  be  required  to 
achieve  convergence,  and  the  advantage  gained  by  solving  each  subsystem  with  its  own  timestep  will 
be  lost  To  show  this  interaction  for  a  practical  example,  we  will  use  the  Relax2.3[13]  program  to 
compare  the  computation  time  required  to  simulate  a  141 -node  CMOS  memory  circuit  using  standard 
direct  methods  and  using  the  WR  algorithm.  In  order  to  demonstrate  the  effect  of  tighter  coupling, 
the  CMOS  memory  circuit  will  be  simulated  using  several  values  of  a  parameter  XQC,  which  is  the 
percent  of  the  gate  oxide  capacitance  that  is  considered  as  gate-drain  or  gate-source  overlap 
capacitance. 
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TABLE  7.3  -  DIRECT  VS  WR  ON  A  MEMORY  CIRCUIT  WITH  DIFFERENT  COUPLINGS 

METHOD 

XQC 

TIMEPOINTS 

#  WR  ITERS 

CPU  TIME 

Direct 

0.01 

1 

933s 

WR 

0.01 

2.5 

304s 

Direct 

122,988 

1 

WR 

19,199 

3 

Direct 

0.2 

118,335 

1 

WR 

0.2 

19,193 

4 

Direct 

0.33 

115,233 

1 

895s 

WR 

0.33 

19,315 

6.5 

707s 

The  results  in  Table  7.3  are  exactly  as  expected.  As  the  coupling  increases,  the  number  of  WR 
iterations  required  increases,  and  the  difference  in  the  simulation  time  for  WR  and  the  direct  method 
decreases. 

It  is  possible  to  verify,  for  this  example,  our  claim  of  the  nature  of  the  efficiencies  of  using  WR. 
Consider  the  number  of  timepoints  computed  by  the  direct  method  versus  the  number  of  computed 
timepoints  for  the  WR  method  in  the  final  iteration.  By  comparing  these  two  numbers,  a  bound  can 
be  put  on  the  maximum  speed  increase  that  can  achieved  by  solving  different  subsystems  using  dif¬ 
ferent  timesteps  (Note  that  we  are  only  considering  the  number  of  timepoints  computed  by  the  WR 
method  in  the  final  iteration,  because  we  are  only  interested  in  the  number  of  timepoints  needed  to 
accurately  represent  the  given  waveform). 

The  total  number  of  timepoints  computed  for  each  of  the  simulation  cases  of  the  memory  cir¬ 
cuit  example  is  also  given  in  Table  7.3.  This  number  is  the  sum  of  the  computed  timepoints  over  all 
the  waveforms  in  the  circuit.  If  most  of  the  efficency  of  a  decomposition  method  stems  from  solving 
each  of  the  subsystems  with  its  own  timestep,  then  the  maximum  improvement  that  could  be  gained 
from  a  decomposition  integration  method  would  be  the  ratio  of  the  number  of  timepoints  computed 
using  the  direct  method  compared  to  the  number  of  timepoints  computed  in  the  final  WR  iteration. 
As  can  be  seen  from  the  Table  7.3,  for  the  CMOS  memory  example  this  ratio  is  approximately  six. 
In  order  to  compute  the  actual  efficiency  of  the  WR  method,  the  average  number  of  WR  iterations 
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performed  must  be  considered,  because  for  each  WR  iteration,  the  set  of  timepoints  is  recomputed. 
Then,  if  our  claims  above  are  correct,  when  the  ratio  of  the  number  of  timepoints  for  the  direct 
method  to  the  number  of  WR  timepoints  is  divided  into  the  average  number  of  relaxation  iterations, 
the  result  should  be  almost  equal  to  the  ratio  of  WR  computation  time  to  direct  computation  time. 
And  as  Table  7.3  shows,  it  is. 

In  the  above  analysis  we  have  ignored  an  important  advantage  of  relaxation  methods:  that  they 
avoid  large  matrix  solutions.  This  is  a  reasonable  assumption  for  the  above  example  becausie  the 
matrix  operations  account  for  only  a  small  percentage  of  the  computations,  even  when  direct  methods 
are  used.  However,  for  much  larger  problems,  of  the  order  of  several  thousand  nodes,  the  time  to 
perform  the  large  matrix  solutions  required  by  direct  methods  will  dominate.  In  those  casei  WR 
methods  should  compare  even  more  favorably  because  they  avoid  these  large  matrix  solutions. 

Finally,  in  Table  7.4,  we  present  several  circuits  that  have  been  simulated  using  RELAX2.3 
with  direct  and  WR  methods. 


TABLE  7.4  -  DIRECT  METHODS  VS  WR  FOR  SEVERAL  INDUSTRIAL  CIRCUITS 

Circuit 

Devices 

DIRECT 

WR 

uP  Control 

232 

90s* 

45s* 

Cmos  Memory 

621 

995s* 

308s* 

4-bit  Counter 

259 

540s* 

299s* 

Inverter  Chain 

250 

98s** 

38s** 

Digital  Filter 

1082 

1800s* 

520s* 

Encode-Decode 

3295 

5000s* 

1500s* 

VHSIC  Memory 

625 

17174s** 

12505s** 

*On  Vaxl  1/780  running  Unix  using  Shichman-Hodges  Mosfet  model 
•*On  Vaxl  1/780  running  VMS  using  Yang-Chatterjee  Mosfet  model 
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CHAPTER  8  -  PARALLEL  WR  ALGORITHMS 

Exploiting  parallel  computation  for  circuit  simulation  is  extremely  important  because  the  size 
of  the  circuits  for  which  circuit  simulation  has  been  applied  has  grown  at  rate  that  far  exceeds  the 
increase  in  computational  power  due  to  technological  improvement  The  only  way  to  keep  pace  with 
the  increasing  demand  is  to  be  able  to  apply  many  processors  to  the  problem,  and  the  number  of 
processors  that  can  be  used  must  scale  up  with  the  size  of  the  problem. 

A  variety  of  techniques  for  the  parallel  solution  of  ordinary  differential  equations  have  been 
examined  in  the  literature[63].  For  circuit  simulation,  four  techniques  have  been  applied.  The 
SPICE2  program  was  rewritten  to  take  advantage  of  the  Cray  Computer  vector  capability[48];  a 
parallel  version  of  a  similar  direct  method  has  been  implemented  on  the  Cosmic-Cube,  a  message¬ 
passing  based  parallel  computer;  the  Gauss-3acobi  form  of  the  algebraic  reiaxation-Newton  algorithm 
presented  in  Section  3.2  has  been  implemented  on  both  a  shared-memory  computer,  the  Sequent 
Balance  3000[64],  and  ITM’s  Connection  Machine[65];  and  a  version  of  the  Iterated  Timing  Analysis 
algorithm  (Section  3.2)  has  been  implemented  on  the  BBN  Butterfly[34], 

In  this  chapter,  the  implementation  of  two  WR-based  parallel  circuit  simulation  algorithms  on 
a  shared  memory  computer  Will  be  described.  We  will  start  by  presenting  a  brief  overview  of  the  as¬ 
pects  of  a  shared-memory  computer  that  effect  the  algorithm  implementation,  and  then  describe  the 
two  parallel  WR  algorithms,  one  based  on  using  a  mixture  of  Gauss-Seidel  and  Gauss-Jacobi  relaxa¬ 
tion,  and  the  other  based  on  pipelining  the  waveform  computation.  For  each  algorithm,  experimental 
results  will  be  presented. 

SECTION  8.1  -  A  BRIEF  OVERVIEW  OF  THE  SHARED  MEMORY  COMPUTER 

When  attempting  to  write  efficient  programs  for  serial  computers,  knowledge  about  the  specific 
details  of  the  architecture  is  useful,  but  not  essential.  This  is  not  the  case  for  programming  on  a  par¬ 
allel  computer.  Specific  details  about  the  architecture  can  influence  decisions  about  thd  implemen- 


Page  144 


tation  of  an  algorithm,  and  can  even  effect  the  choice  of  algorithm.  Since  the  algorithms  that  Will  be 
described  below  were  implemented  on  the  Sequent  Balance  8000,  a  shared-memory  parallel  com¬ 
puter,  in  this  section  we  will  describe  those  aspects  of  the  architecture  that  effected  the  implementa¬ 
tion  of  parallel  versions  of  the  WR  algorithm.  For  a  more  detailed  treatment  of  this  subject,  se«[56]. 

The  key  problem  in  designing  a  parallel  processor  is  that  of  communication  between  the 
processors.  One  simple  approach  is  to  design  a  parallel  computer  by  gathering  together  many  stand¬ 
ard  serial  computers,  and  connecting  them  together  with  a  comunication  network.  Usually!  such 
computers  are  referred  to  as  message-passing  parallel  computers,  because  data  is  tranferred  between 
the  many  processors  by  passing  messages  on  the  communication  network.  The  disadvantage  of  such 
a  system  is  that  in  order  to  move  data  from  the  memory  of  one  processor  into  the  memory  of  the 
second  processor,  both  the  transmitting  and  receiving  processors  must  be  involved. 

Another  approach  to  the  problem  of  communicating  between  parallel  processors  is  to  redesign 
the  memory  system,  so  that  the  aggragate  memory  of  all  the  processors  is  directly  addressable  by  any 
one  of  the  individual  processors.  Such  a  system  is  referred  to  as  a  shared-memory  system  because  the 
processors  are  all  sharing  the  single  resource,  the  memory.  The  main  advantages  of  a  shared-memory 
machine  is  that  it  is  not  necessary  to  explicitly  transfer  data  from  one  processor  to  another.  When  a 
processor  needs  data  from  another  processor,  it  simply  reads  from  the  memory  locations  in  which  the 
other  processor  has  written.  This  also  allows  for  more  dynamic  algorithm  structures,  because  it  is  not 
necessary  to  determine  beforehand  which  processors  will  need  the  results  of  a  given  calculatioa  The 
disadvantages  of  the  shared-memory  computer  are  that  all  processors  must  contend  for  a  single  re¬ 
source,  the  memory,  and  guaranteed  syncronization  between  processors  is  not  simple  without 
special-purpose  hardware. 

One  of  the  most  important  aspects  of  a  shared-memory  parallel  computer  is  bow  the  memory 
is  distributed  among  the  individual  processors.  There  are  fundamentally  only  two  choices,  either  each 
processor  has  a  portion  of  the  shared  memory  which  it  can  access  rapidly,  and  that  others  can  access 
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but  not  as  quickly,  or  all  the  memory  is  centralized,  and  the  many  processors  contend  on  an  equal 
footing  for  access  to  it 

If  the  memory  is  distributed  among  the  processors,  then  a  parallel  algorithm  will  perform  better 
if  the  data  for  the  computation  can  be  partitioned  so  that  each  processor  performs  computations  using 
only  the  data  in  its  own  portion  of  the  shared  memory.  It  is  usually  the  case  that  by  partitioning  the 
data,  so  that  each  of  the  processors  can  only  work  on  an  exclusive  portion  of  a  large  problem,  some 
of  the  parallelism  of  a  given  algorithm  will  no  longer  be  exploitable  and  parallel  efficiency  will  be  lost 
For  example,  if  at  a  certain  point  in  the  process  of  solving  a  large  problem,  several  calculations  that 
could  be  performed  concurrently  all  require  data  from  the  same  partition,  those  calculations  will  be 
performed  serially.  If  simultaneously,  there  are  no  calculations  to  be  performed  using  data  from  an¬ 
other  partition,  a  processor  will  be  idle. 

A  way  of  eliminating  the  loss  of  parallelism  at  the  cost  of  complicating  the  control  structure  of 
the  program  is  to  have  each  processor  use  a  priority  scheme.  In  such  a  scheme,  each  processor  at¬ 
tempts  to  perform  calculations  using  data  in  its  own  partition,  and  then  if  there  are  none  to  be  per¬ 
formed,  the  processor  will  atttempt  to  perform  calculations  using  data  from  other  partitions. 

Clearly,  when  using  a  shared-memory  computer  with  distributed  memory,  the  trade-off’s  of 
faster  memory  access,  loss  of  parallelism,  and  more  complicated  control  structure  must  be  examined 
carefully(For  an  example  for  the  case  of  circuit  simulation  see[34]). 

The  memory  on  the  parallel  computer  used  for  parallel  WR  experiments  is  centralized,  where 
all  the  processors  contend  for  one  large  shared  memory.  For  such  an  architecture,  there  is  no  ad¬ 
vantage  to  partitioning  the  data  for  a  large  problem  among  the  processors,  as  they  will  still  have  to 
contend  for  the  same  centralized  memory  pool.  For  this  reason,  the  algorithms  presented  below  ig¬ 
nore  the  issue  of  partitioning  the  data  among  many  processors. 

In  order  to  avoid  the  obvious  bottleneck  created  by  having  many  processors  contend  for  data 
out  of  the  same  central  memory,  most  implementations  of  shared-memory  computers  that  use  cen¬ 
tralized  memory  attempt  to  reduce  this  contention  by  including  a  large  cache  memory  With  each  of 
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the  processors.  As  with  any  cache  memory  scheme,  these  caches  attempt  to  exploit  locality  of  refer¬ 
ence,  that  it  will  usually  be  the  case  that  each  of  the  processors  are  actively  using  only  a  small  amount 
of  data.  Since  this  data  will  probably  be  available  from  the  cache,  for  most  memory  accesses  It  will 
not  be  necessary  to  generate  a  request  to  the  main  central  memory. 

Using  caches  on  a  parallel  computer  is  not  as  straight-forward  as  on  a  serial  computer.  Since 
there  are  many  caches,  and  they  are  all  supposed  to  contain  a  copy  of  the  data  in  the  central  memory, 
and  any  processor  can  write  in  any  memory  location,  it  is  possible  for  the  caches  to  loose  consistency. 
By  this  it  is  meant  that  the  contents  of  a  cache  may  not  reflect  the  current  contents  of  the  central 
memory.  For  example,  if  the  contents  of  memory  location^  is  in  both  the  cache  for  processor  1  and 
the  cache  for  processor  2,  and  processor  1  updates  A,  then  the  data  in  the  cache  for  processor  2  will 
be  incorrect. 

As  the  example  demonstrates,  even  if  the  central  memory  is  updated  whenever  a  processor 
updates  a  location  contained  in  its  cache,  a  cache  inconsistency  can  occur  within  a  cache  of  another 
processor.  There  are  a  variety  of  schemes  for  avoiding  this  problem[62],  but  we  will  only  mention  the 
technique  applied  in  the  computer  used  for  experimentation.  The  scheme  is  simple,  all  the  caches 
monitor  all  the  writes  to  central  memory  from  any  of  the  processors.  If  a  cache  contains  a  location 
being  written  to  by  any  of  the  other  processors,  it  updates  its  own  copy  of  the  data  in  the  given  lo¬ 
cation.  By  snooping  in  on  the  writes  to  central  memory,  each  cache  assures  that  it  has  the  most  cur¬ 
rent  data. 

The  snooping  cache  consistency  strategy  has  a  particularly  useful  implication.  It  is  frequently 
necessary  to  have  one  processor  wait  for  another  processor  to  finish  a  computation.  If  the  computing 
processor  is  to  change  a  location  in  memory  when  finished,  the  second  processor  can  continouSly  test 
that  location  to  determine  when  the  computing  processor  is  finished.  Normally,  this  is  a  poor  ap¬ 
proach  for  a  parallel  environment,  because  the  waiting  processor  will  be  continuously  reading  from 
the  central  memory  and  generating  excess  memory  traffic.  If  many  processors  are  waiting  for  the 
completion  of  one  processor’s  computation,  this  excess  traffic  can  become  enormous,  and  slow  the 
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computing  processor  which  will  have  to  contend  with  the  excess  traffic.  If  the  cache  architecture 
described  above  is  used,  the  excess  traffic  is  eliminated.  Each  waiting  processor  will  keep  rereading 
a  location  which  will  be  in  its  own  cache,  and  will  therefore  not  be  generating  any  central  memory 
traffic.  When  the  computing  processor  finishes,  each  of  the  other  processor  caches  will  spot  the  write 
to  the  monitored  location  in  central  memory  and  each  cache  will  update  its  own  copy  of  the  data. 
The  waiting  processors  will  therefore  be  made  immediately  aware  of  the  completion  of  the  computing 
processor,  but  will  not  have  impeded  the  progress  of  the  computing  processor  by  generating  excess 
memory  traffic. 

The  last  aspect  of  the  parallel  computer  architecture  that  we  will  consider  is  that  of  mutual 
exclusion  or  locking.  In  almost  all  parallel  programs  there  are  critical  sections  that  must  be  performed 
serially,  that  is,  only  one  processor  should  be  executing  the  section  at  a  time.  The  usual  mechanism 
for  insuring  this  is  the  test-and-set  instruction.  If  a  processor  executes  a  test-and-set  instruction  on  a 
given  location  in  memory,  the  contents  of  the  location  is  returned  to  the  processor  and  simultane¬ 
ously,  if  the  location  was  not  set,  it  is  set. 

The  mechanism  can  be  used  to  perform  locking  as  follows.  A  particular  location  in  memory  is 
used  as  the  lock.  If  a  processor  is  about  to  execute  a  critical  section  of  a  parallel  program,  it  first  ex¬ 
ecutes  a  test-and-set  on  the  lock  location.  If  the  result  indicates  that  the  location  was  not  set,  then 
the  processor  can  safely  execute  the  critical  section,  and  clear  the  lock  location  when  finished.  If  the 
result  indicates  that  the  lock  was  already  set,  the  processor  must  wait  until  until  the  lock  becomes 
clear  and  then  retry  the  test-and-set. 

SECTION  8.2  -  A  MIXED  GAUSS-SEIDEL/JACOBI  PARALLEL  WR  ALGORTIHM 

An  obvious  way  of  parallelizing  WR  is  to  use  the  Gauss-Jacobi  version  of  WR.  In  this  algo¬ 
rithm,  the  relaxation  makes  use  of  the  waveforms  computed  at  the  previous  iteration  for  all  the  sub¬ 
systems.  In  this  case,  all  the  subsystems  can  be  analyzed  independently  by  different  processors.  One 
of  the  difficulties  in  applying  this  algorithm  is  that  MOS  digital  circuits  are  highly  directional,  and,  as 
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mentioned  in  Section  7.2,  if  this  directionality  is  not  exploited  slow  convergence  may  result.  For  ex¬ 
ample,  consider  applying  WR  to  compute  the  transient  response  of  a  chain  of  inverters.  If  the  first 
inverter’s  output  is  computed  first,  and  the  result  is  used  to  compute  the  second  inverter’s  output, 
which  is  then  used  for  the  third  inverter,  etc.,  the  resulting  waveforms  for  this  first  iteration  of  the 
WR  algorithm  will  be  very  close  to  the  correct  solution.  However,  if  the  second  and  third  inVerter 
outputs  are  computed  in  parallel  with  the  first  inverter’s  output,  the  results  will  not  be  close  to  the 
correct  solution  because  no  reasonable  guess  for  the  second  and  third  inverter  inputs  will  be  available. 
For  this  reason,  after  partitioning,  the  RELAX2.3  program  orders  the  subcircuits  so  that  the 
directionality  of  the  circuit  is  followed  as  closely  as  possible. 

Following  a  strict  ordering  of  the  relaxation  computation  (Gauss-Seidel)  does  not  allow  for 
computing  entire  waveforms  in  parallel,  and  computing  the  next  iteration  waveforms  for  every  sub- 
circuit  at  once  (Gauss-Jacobi)  allows  for  substantial  parallelism,  but  is  not  very  efficient  (converges 
more  slowly).  In  order  to  preserve  the  efficiency  of  the  Gauss-Seidel  algorithm  and  allow  for  some 
of  the  parallelism  of  Gauss-Jacobi,  a  mixed  approach  can  be  employed.  The  mixed  approach  is  based 
on  the  observation  that  large  digital  circuits  contain  many  subsystems  that  can  be  computed  in  parallel 
without  slowing  the  convergence.  This  is  because  large  digital  circuits  tend  to  be  wide.  Rather  than 
being  like  a  long  chain  of  gates,  they  are  like  many  parallel  chains,  with  some  interaction  between  the 
chains.  For  this  reason,  it  is  possible  to  order  the  computation  so  that  subcircuits  in  parallel  "chains" 
can  be  computed  in  parallel,  but  the  serial  dependence  inside  a  chain  is  preserved.  This  will  not  allow 
for  as  much  parallelism  as  the  Gauss-Jacobi  scheme,  but  should  preserve  most  of  the  efficiency  of  the 
Gauss-Seidel  scheme. 

In  Algorithm  8.1,  we  present  a  probabilistic  approach  to  attempting  to  follow  the  ordering  of 
the  subcircuits.  The  algorithm  is  set  up  by  establishing  both  the  space  in  shared  memory  for  storage 
of  the  iteration  waveforms,  and  a  buffer  or  queue  with  the  list  of  subcircuits  in  the  order  derived  from 
Algorithm  7.3.  Each  of  the  processors  then  begins  by  taking  a  subcircuit  from  the  queue  and  then 
computing  the  subcircuits  output  waveforms  using  the  newest  available  external  waveforms.  When 
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mentioned  above,  this  exploits  the  nature  of  the  cache  consistency  strategy.  Finally,  in  this  case  it  is 
not  necessary  to  separately  control  access  to  the  waveforms.  Since  the  waveforms  will  only  be  written 
as  a  result  of  the  computations  perform  on  their  associated  subcircuits,  and  a  waveform  is  associated 
with  only  one  subcircuit(This  would  not  be  the  case  if  an  overlapped  relaxation  algorithm  were  used) 
the  mutual  exclusion  of  the  subcircuit  queue  will!  prevent  waveform  writes  from  colliding. 


SECTION  8.3  -  TIMEPOINT-PIPELINING  WR  ALGORITHM 

It  is  possible  to  parallelize  the  WR  algorithm  while  still  preserving  a  strict  ordering  of  the 
computation  of  the  subcircuit  waveforms  (Gauss-Seidel),  by  pipelining  the  waveform  computation. 
In  this  approach,  one  processor  starts  computing  the  transient  response  for  a  subcircuit.  Once  a  first 
timepoint  is  generated,  a  second  processor  begins  computing  the  first  timepoint  for  the  second  sub¬ 
circuit,  while  the  first  processor  computes  the  second  timepoint  for  the  first  subcircuit.  On  the  next 
step  a  third  processor  is  introduced,  to  compute  the  first  timepoint  for  the  third  subcircuit,  and  so  on. 

Conceptually,  the  operations  of  a  given  processor  in  a  parallel  timepoint  pipelining  algorithm 
are  quite  simple.  The  algorithm  is  set  up  by  establishing  both  the  space  in  shared  memory  for  storage 
of  the  iteration  waveforms,  and  a  buffer  or  queue  with  the  list  of  subcircuits.  Each  of  the  processors 
then  starts  by  taking  a  subcircuit  from  the  queue.  The  individual  processors  examine  their  respective 
subcircuit’s  external  waveforms  to  see  if  the  waveform  values  needed  to  compute  the  next  integration 
timestep  are  available.  If  so,  the  next  timestep  for  the  subcircuit  is  computed.  Otherwise,  the  sub¬ 
circuit  is  returned  to  the  queue  and  the  processor  trys  again  with  another  subcircuit  from  the  queue. 
As  timepoints  are  computed,  more  of  the  subcircuits  will  have  the  information  needed  to  compute 
their  own  timepoints. 

As  one  might  expect,  a  practical  timepoint  pipelining  algorithm  is  more  complicated  that  the 
conceptual  algorithm.  Perhaps  the  most  obvious  difficulty  is  that  there  is  a  tremendous  overhead  in 
having  every  processor  search  through  all  the  subcircuits  to  find  one  of  the  few  for  which  a  timepoint 
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the  waveform  computation  is  completed,  the  subcircuit  is  temporarily  discarded  and  and  the  processor 
takes  a  next  subcircuit  off  the  queue.  This  continues  until  the  queue  is  exhausted  and  all  the 
processors  are  finished.  Then  queue  is  reset,  and  the  processors  all  start  picking  up  subcircuits  again. 

This  algorithm  is  probabilistic  in  the  sense  that  there  is  no  guarantee  that  the  transient  com¬ 
putation  for  a  given  subcircuit  will  be  finished  before  its  output  is  needed  by  another  subcircuit  who 
is  strongly  serially  dependent  on  the  first  subcircuit’s  output.  It  is  likely  that  the  given  subcircuit’s 
output  will  have  been  computed  if  the  circuit  is  very  wide  (there  are  a  large  number  of  parallel  chains) 
compared  to  the  number  of  processors.  In  addition,  since  ail  the  subcircuit  outputs  must  be  computed 
before  any  subcircuit's  output  is  recomputed,  no  subcircuit  will  be  more  than  one  iteration  behind. 

Algorithm  8.1  -  (Jacobi/Seidel  based  Parallel  WR) 

Initialization.  Both  subcircuits  and  waveforms  in  shared-memory. 
queue  -  ordered _ list _ of _ subcircuits 

while  (  all _ converged  — —  FALSE  )  1  Parallel  iteration  loop.  All  processors  execute. 

if  (  processor  — —  1  )  { 

reset _ queueQ 

idle _ count —  0 

} 

while  ( idle _ count  ^  number  of  processors )  { 

while  ( test-and-set(queuelock)  «■-  set )  {  Tight  loop  waiting  for  queue  to  unlock.  } 
Queue  is  locked,  get  next  subcircuit 
NextSub  —  Get _ next _ queue _ entry() 

if  (NextSub  — NULL)  { 

increment  idle _ count) 

clear(queuelock) 

1 

else  {  There  is  another  subcircuit  on  the  queue. 
clear(queueiock) 

Compute _ Subcircuit _ Waveforms(NextSub) 

Check _ Waveform _ Convergence(NextSub) 

} 

* 


Note  that  the  attributes  of  the  parallel  architecture  have  been  considered  in  Algorithm  8.1. 
Since  the  machine  is  a  centralized  shared-memory  machine,  the  data  describing  the  subcircuits  and 
the  computed  waveforms  are  left  in  shared  memory,  to  be  accessed  as  needed.  Also  note  that  each 
of  the  processors  waits  for  the  queue  to  be  free  by  examining  the  lock  variable  in  a  tight  loop.  As 
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can  be  computed.  It  is  possible  to  reduce  the  number  of  candidate  subcircuits  a  processor  must  search 
by  only  considering  those  subcircuits  for  which  at  least  one  of  the  external  waveforms  has  more 
timepoints  than  it  had  when  the  subcircuit  was  last  processed.  Clearly,  this  will  avoid  having  the 
processors  continuously  rechecking  subcircuits  for  which  no  new  information  is  available,  and  there¬ 
fore  no  new  timestep  could  be  computed. 

This  kind  of  selective  search  algorithm  can  be  implemented  by  altering  the  way  the  queue  of 
subcircuits  is  used.  When  a  processor  discerns  that  it  is  not  possible  to  compute  a  new  timepoint  for 
a  subcircuit,  instead  of  returning  the  subcircuit  to  the  queue,  the  subcircuit  is  temporarily  discarded. 
If  a  processor  succeeds  in  computing  a  timepoint  for  a  subcircuit,  those  subcircuits  that  are  connected 
to  the  given  subcircuit,  referred  to  as  the  fanouts  of  the  subcircuit,  are  added  to  the  queue  (Of  course, 
any  of  the  fanouts  that  are  already  on  the  queue  are  not  duplicated).  In  this  way,  the  only  subcircuits 
that  will  be  on  the  queue  are  those  for  which  it  is  likely  that  the  waveform  values  needed  to  compute 
a  next  timepoint  will  be  available. 

Another  aspect  of  the  timepoint  pipelining  algorithm  that  increases  the  exploitable  parallelism 
at  the  cost  of  slightly  complicating  the  algorithm  is  to  allow  the  timepoint  pipelining  to  extend  across 
iteration  boundaries.  For  example,  consider  a  chain  of  two  inverters,  and  assume  that  it  takes  two 
timesteps  to  compute  each  of  the  inverter  outputs.  As  before,  the  second  timestep  of  the  first  inverter 
can  be  computed  in  parallel  with  the  computation  of  the  first  timestep  of  the  second  inverter.  Then, 
while  the  second  timestep  of  the  the  second  inverter  is  being  computed,  there  is  enough  information 
to  compute  the  first  timestep  of  the  first  inverter  for  the  second  WR  iteration. 

This  enhancement  doesn’t  really  complicate  the  conceptual  algorithm,  until  one  considers  the 
question  of  when  to  stop.  For  a  long  chain  of  inverters,  allowing  the  pipelining  to  extend  across  it¬ 
eration  boundaries  can  easily  allow  for  the  first  inverter  to  be  many  iterations  ahead  of  the  last 
inverter.  Since  WR  convergence  can  only  be  determined  when  all  the  waveforms  for  a  given  iteration 
have  been  computed,  it  may  well  be  that  the  WR  iteration  being  computed  for  the  first  inverter  is 
many  iterations  beyond  what  is  necessary  to  achieve  satisfactory  convergence.  The  difficultly  is  that 
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this  fact  will  not  be  discovered  until  much  later,  when  all  inverter  outputs  have  been  computed  for  the 
iteration  for  which  satisfactory  convergence  is  achieved. 

This  is  not  a  disasterous  problem,  the  algorithm  will  still  produce  correct  solutions,  but  unnec¬ 
essary  computations  will  be  performed  and  efficiency  will  be  degraded.  The  unnecessary  computa¬ 
tions  are  reasonably  simple  to  avoid,  by  not  allowing  any  subcircuit  to  start  on  iteration  N+T  until 
nonconvergence  of  some  waveform  of  iteration  N  has  been  detected.  It  is,  of  course,  important  to 
discover  as  quickly  as  possible  if  it  will  be  necessary  to  compute  iteration  N+l,  so  that  the  pipelining 
of  that  iteration  can  begin.  For  this  reason,  in  the  timepoint  pipelining  algorithm  presented  below, 
convergence  is  checked  on  a  timepoint  by  timepoint  basis,  immediately  after  a  timepoint  is  computed. 

Algorithm  8.2  -  (Timepoint  Pipelining  WR  Algorithm! 

Initialization.  Both  subcircuits  and  waveforms  in  shared-memory. 

queue  —  ordered _ list _ of _ subcircuits 

idle _ count  -  0 

Max  iter  so  far  is  the  iter  after  the  last  one  for  which  nonconvergence  was  detected 

max _ iter  so  far  -  1 

Parallel  iteration  loop.  All  processors  execute. 

while  ( idle  count  number  of  processors  )  { at  least  one  processor  is  still  computing. 
while  (  test-and-set(queuelock)  — —  set )  {  Tight  loop  waiting  for  queue  to  unlock.  } 

Queue  is  locked,  get  next  subcircuit  in  the  queue  for  which  the  work  that  might  be  performed  on 
it  is  for  an  iteration  that  is  no  more  than  one  beyond  the  maximum  iteration  for  which  noncon¬ 
vergence  has  been  detected. 

NextSub  Get _ next _ queue _ entry(max _ iter _ so _ far) 

if  (  NextSub  —  NULL )  { 

increment  idle _ count) 

clear(queuelock) 

\ 

else  { 

There  is  a  subckt  on  the  queue  whose  iteration  is  not  beyond  max  iter _ so  far. 

clear(queuelock) 

Compute  as  many  timepoints  as  possible  with  available  waveform  values. 

repeat  1 

Check  to  see  if  external  values  needed  to  compute  the  next  timestep  are  available. 

cando  —  Check _ for _ next _ step(NextSub) 

if  (  cando  —  TRUE  )  \ 

Compute _ Next _ Step(NextSub) 

converged  -  Check _ Step _ Convergence(NextSub) 

if  ( (converged  ■»—  FALSE)  and  (NextSub.iter _ count  — —  max _ iter  so  far) )  { 

Keep  max _ iter  so  far  ahead  of  the  nonconverged  iterations. 

increment(max _ iter _ so _ far) 

i 

enqueue  fanouts(NextSub) 
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|  until  (  cando  FALSE ) 

} 

\ 

■ 

SECTION  8.4  -  PARALLEL  ALGORITHM  TEST  RESULTS 

As  mentioned  above,  the  two  algorithms  were  implemented  on  a  9  processor  configuration  of 
the  Sequent  Balance  8000  computer  (larger  configurations  are  available).  The  results  from  several 
experiments  for  the  two  algorithms  are  given  in  Tables  8.1  and  8-2.  As  the  results  from  the  Eprom 
and  microprocessor  control  circuit  indicate,  the  timepoint  pipelining  algorithm  makes  much  more  ef¬ 
ficient  use  of  the  available  processors.  In  fact,  as  Table  8.2  shows,  the  timepoint  pipelining  algorithm 
running  on  the  Balance  8000  runs  substantially  faster  than  the  serial  WR  algorithm  running  on  a 
Vax/780. 

A  second  point  should  be  made  about  the  timepoint  pipelining  examples.  It  can  be  seen  that 
the  speed-up  does  not  remain  linear  to  nine  processors,  but  starts  to  drop  off.  This  is  surprising  given 
the  size  of  the  examples,  but  not  when  the  type  of  circuit  being  simulated  is  considered.  For  the 
biggest  example,  the  cmos  ram,  the  partitioning  algorithm  produces  approximately  75  subcircuits,  and 
this  would  indicate  that  a  speed-up  of  75  should  be  obtainable,  or  at  least  approachable.  This  ignores 
one  of  the  features  of  the  WR  algorithm,  in  that  only  those  portions  of  the  circuit  that  are  active  are 
participating  in  the  computation.  For  digital  circuits,  this  is  usually  less  than  ten  percent  of  the  circuit. 
This  implies  that  for  the  cmos  ram  example  over  any  given  interval,  roughly  seven  subcircuits  are  ac¬ 
tive,  and  involved  in  the  computation,  and  therefore  only  a  speed-up  of  seven  could  be  expected. 


Table  8.1  - 

G-S/G-J  WR  ON  SEVERAL  #  OF  PROCESSORS 

Circuit 

FETs 

1 

3 

6 

9 

uP  Control 

66 

595 

338 

270 

259 

Eprom 

348 

512 

317 

286 

266 
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CHAPTER  9  -  CONCLUSIONS 

In  this  thesis,  a  wide  variety  of  new  theoretical  and  practical  results  relating  to  numerical  inte¬ 
gration  methods  for  circuit  simulation  problems  have  been  presented.  A  novel  property  that  can  be 
used  to  classify  integration  methods,  that  of  domain  of  dependence,  was  introduced,  and  its  Importance 
demonstrated  by  example.  A  wide  collection  of  integration  methods  that  have  been  used'  for  circuit 
simulation  were  then  analyzed  with  respect  to  this  and  several  other  properties. 

Following,  the  WR  algorithm  was  introduced,  and  a  new  proof  of  the  WR  convergence,  one 
that  demonstrates  that  the  WR  algorithm  is  a  contraction  mapping  in  a  particular  norm,  was  pre¬ 
sented.  Extensions  to  the  WR  algorithm,  along  with  convergence  theorems,  were  also  presented.  In 
addition,  the  interaction  between  WR  algorithms  and  multistep  integration  methods  was  considered 
in  detail,  and  the  first  theorem  proving  the  convergence  of  the  multi-rate  discretized  WR  relaxation 
algorithm  was  presented. 

The  practical  aspects  of  WR  were  examined  using  a  new  circuit  simulation  program, 
RELAX2.3.  The  novel  algorithms  used  by  the  program  to  partition  large  circuits  and  dynamically 
adjust  the  windows  were  described,  and  results  from  the  program  on  industrial  circuits  presented.  In 
addition,  the  implementation  of  two  WR-based  parallel  circuit  simulation  algorithms  were  presented 
along  with  results. 

There  are  several  theoretical  questions  about  WR  that  were  only  partially  answered  in  this 
thesis.  In  particular,  research  is  needed  to  more  thoroughly  understand  the  nature  of  WR  conver¬ 
gence  under  discretization,  and  to  characterize  systems  for  which  WR  algorithms  contract  in  uniform 
norm.  In  addition,  theoretical  and  practical  work  needs  to  be  continued  on  breaking  large  systems 
into  smaller  subsystems  in  such  a  way  that  relaxation  algorithms  converge  rapidly. 

There  is  also  much  work  to  be  done  to  improve  the  speed  and  robustness  of  the  WR  algorithm. 
In  particular,  more  sophisticated  partitioning  algorithms  should  be  devised.  Also,  the  results  on  par¬ 
allel  WR  algorithms  presented  in  this  thesis  are  preliminary.  Experiments  should  be  carried  out  on  a 
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variety  of  different  architectures  to  investigate  the  relationships  between  algorithms  and  computer 
architecture. 
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